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SOUTHERN REGIONAL GRADUATE SUMMER SESSIONS IN STATISTICS 


The fourth Southern Regional Graduate Summer Session in Statistics will be held 
from June 12 through July 20 at the Virginia Polytechnic Institute, Blacksburg, Virginia. 

The session lasts six weeks and each course offered carries approximately five quarter 
hours of graduate credit. The summer work in statistics may be applied towards residence 
requirements at any one of the cooperating institutions, as well as certain other institutions, 
in partial fulfillment of residence requirements for graduate degrees. 

The faculty for the 1957 session will include E. J. Williams, D. B. DeLury, J. L. McHugh, 
as well as the following staff members from the Virginia Polytechnic Institute: W. O. Ash, 
L. S. Brenna, R. A. Bradley, C. W. Clunies-Ross, J. E. Freund, R. J. Freund, B. Harsh- 
barger, C. Y. Kramer, and R. L. Wine. 

Of particular interest will be the lectures by D. B. DeLury on the Sampling of Bio- 
logical Populations. Other courses to be offered include Analysis of Variance, Rank Order 
Statistics, Stochastic Processes, Probability, Statistical Inference, Theory of Least Squares, 
Statistical Methods, Engineering Statistics, and Sampling. Seminars, which will include 
many of the foremost statisticians in the eastern part of the United States, will be con- 
ducted each afternoon Monday through Thursday from 3:00 to 4:30. These seminars 
will be on some of the more recent research in the field of statistics. 

The total tuition fee will be $38.00 for the six-week term. Doctoral courtesy will be 
offered to those holding doctoral degrees. Living and other expenses at the Virginia Poly- 
technic Institute are reasonable. The Virginia Polytechnic Institute is located on the 
scenic Alleghany Mountain plateau 2100 feet above sea level. The summer climate is 
delightful. 


Inquiries shoud be addressed to Boyd Harshbarger, Head, Department of Statistics, Virginia 
Polytechnic Institute, Blacksburg, Virginia. 
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NEW ANALYSIS OF VARIANCE FORMULAS FOR TREATING 
DATA FROM MUTUALLY PAIRED SUBJECTS* 


JosepH Levt AND Exarne F. Kinpert{ 
YERKES LABORATORIES OF PRIMATE BIOLOGY, ORANGE PARK, FLORIDA 


The experimental design considered in this paper is one in which each 
of a group of several subjects is observed in the presence of each of the other 
subjects of the group, the entire set of possible pairings being repeated or 
replicated on several occasions. Analysis of variance formulas are described 
for this somewhat unusual design. Both the model of constants and mixed 
model are considered. Reliability formulas growing out of the analysis of 
variance calculations are developed. 


This paper describes new formulas in the analysis of variance. The 
novelty arises from the experimental design in which observations were 
made on pairs of chimpanzees of a group, each animal being paired with 
every other one. The formulas of analysis of variance usually given in text- 
books are not appropriate in this situation because of the incompleteness of 
the layout. An animal cannot be paired with itself; consequently cells which 
correspond to this pairing are blank. 

In the study for which this analysis of variance was developed, observa- 
tions on all pairs were repeated five times at irregular intervals over a period 
of five years. Because of these repetitions it was possible to study interactions 
between pairs of animals as well as other interactions. 

An analysis of variance for a design involving mutual pairings of a group 
of individuals when the observations are made once only is described by 
Quenouille [3, p. 256]. The analysis described by Quenouille does not consider 
repetitions or interactions. Although the analysis described in this paper 
was developed for an experiment dealing with chimpanzees, it is clearly 
applicable to experiments of the same design. 

Each score in this study is a measure of the activity of a chimpanzee 
observed for a session of 20 minutes. During the session the chimpanzee was 
alone in a cage sufficiently large to permit free movement. The cage contained 
several items of equipment, such as two shelves, one high and one low, a 
strap suspended from the ceiling on which the animal could swing, and some 
toys. A second animal was placed in a small adjoining cage separated from 
the first cage only by a grating so that the two animals were in full view of 
each other. 


*This work was supported in part by grants M-627; M-627C from the National 
Institutes of Health, Public Health Service. 

tNew York State Department of Civil Service. 

tNew York State Department of Mental Hygiene. 
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A continuous second by second record was kept of the behavior move- 
ments of the first animal, whom we shall call the protagonist. No comparable 
record was kept of the behavior of the second animal, the partner, whose 
confining cage prevented escape from the grating, and limited its movement. 
Each protagonist was paired with six partners in successive observational 
sessions. The purpose of introducing the partner into the study was to ascertain 
whether the activity of the protagonist varied consistently with different 
partners. 

Each of the seven chimpanzee subjects was paired with every other 
subject in the protagonist-partner relationship, making 42 pairings in each 
series of observations. The record for each session was later scored to obtain 
a summary activity measure, which was in essence a weighted average of 
scores for the various behavioral acts of the protagonist during the session. 

Observations involving all 42 pairings were repeated several times 
over a period of five years, giving the five sets of observations which are 
analyzed in this paper. A set of sessions involving all 42 pairings and carried 
out at approximately the same time will be referred to as a replication of the 
experiment. The dates of the five replications were October 1943, November 
1943, October 1944, April 1948 and July 1948. 

The activity scores obtained in the first observational period, October 
1943, are shown in Table 1. In this table the chimpanzees acting as protag- 
onists are named in the column headings; their partners are named in the 
side headings of the rows. A score is the activity measure of the chimpanzee 
named in the heading of the column in which the score occurs when this 
chimpanzee is paired with the chimpanzee named in the corresponding row 
heading. For example, the score 252 in the column headed Falla is the activity 
measure of Falla in the presence of Banka. The activity measure of Banka 
in the presence of Falla is 189. The table immediately suggests analysis of 
variance as a natural method for analyzing the data. The complete analysis 
involves joint consideration of three factors and their interactions. 

Factor 1: Differences among protagonists. This analysis should answer 
the question whether the observed differences among chimpanzees as pro- 
tagonists are small enough to be attributed to chance or sufficiently great 
to be regarded as essential differences among the animals. 

Factor 2: Differences among partners. This analysis is made to determine 
whether the observed differences in activity stimulated by partners are of 
such magnitude that they may be attributed to chance. 

Factor 3: Differences among replications. Differences in activity among 
replications may be attributable to age of animals, to the time between the 
replications, or to their order. 

The interactions among these factors are also of interest in the considera- 
tion of the data at hand. 

Error. The chief error component used in the analysis of variance is the 
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second-order interaction. This is always the error component for testing 
significance of the interactions. However, the significance of the factors, or 
main effects, may also be tested by some of the first-order interactions. 
The factors and their interactions will be shown in relation to the activity 
data of this study in several tables. A complete display of all scores would 
require five tables like Table 1, each showing scores for 42 pairings for a 


TABLE 1 
Activity Scores of Seven Chimpanzees Each 
Paired with Every Other Chimpanzee 
(October 1943) 




















Protagonist 
Partner Banka Falla Fanny Flora Jed Jent Karla Total 
Banka 252 199 168 197 165 148 1129 
Falla 189 252 211 155 215 264 1286 
Fanny 188 279 205 150 186 141 1149 
Flora 160 259 206 195 155 144 1119 
Jed 179 305 219 252 231 157 1343 
Jent 178 266 255 211 150 203 1263 
Karla 170 217 278 189 163 207 1224 
Total 1064 1578 1409 1236 1010 1159 1057 8513 





given replication, i.e., four tables in addition to Table 1. These four tables 
will be omitted. The entries in Tables 2, 3, and 4 are totals of entries based on 
the five basic tables. 

In Table 2 each entry is the sum of the scores for five pairings of a 
particular protagonist with a particular partner, each of the five scores 
being a measure of activity at a given session. In Table 3 each entry is the 
sum of the activity scores of six protagonists paired with a given partner at 
a given replication. In Table 4 each entry is the sum of the activity scores 
of a protagonist paired with each of his six partners. 

In order to visualize the relationship between the basic activity scores 
like those shown in Table 1 and the sums shown in Tables 2, 3 and 4, the 

TABLE 2 
Sums of Activity Scores for Five Pairings 


of Protagonist and Partner 














a = | 
Protagonist } 
Partner Banka Falla Fanny Flora Jed Jent Karla Total 
Ca cea a Pre es oie a 
Banka 1021 996 757 787 776 898 | 5235 
Falla 709 1123 825 644 756 981 | §038 
Fanny 799 1070 813 679 767 834 | 4962 
Flora 714 1121 996 770 755 808 5164 
Jed 800 1161 929 866 811 890 | §457 
Jent 808 1016 1118 848 694 917 | $401 
Karla 735 1123 1212 882 750 794 5496 


Total | 4565. 6512 6374 4991 4324 4659 5328 | 36753 





Protagonists Paired with a Given Partner 
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TABLE 3 


Sums of Activity Scores of Six 




















Replication 
Oct. Nov. Oct. April July 
Partner 1943 1943 1944 1948 1948 Total 
Banka 1129 950 1167 980 1009 5235 
Falla 1286 911 993 977 871 5038 
Fanny 1149 969 1126 826 892 4962 
Flora 1119 1131 1081 934 899 5164 
Jed 1343 1021 1065 1089 939 5457 
Jent 1263 992 1232 995 919 5401 
Karla 1224 1150 1173 1011 938 5496 
Total 8513 7124 7837 6812 6467 36753 
TABLE 4 
Sums of Activity Scores for Each 
Protagonist Paired with Six Partners 
itachi iz Protagonist | 
Replication Banka ‘ Falla Fanny Flora Jed Jent Karla | Total 
Oct. 1943 | 1064 1578 1409 1236 1010 1159 1057 | 8513 
Nov. 1943 | 918 1143 1274 927 873 1043 946 7124 
Oct. 1944 1060 1592 1143 1091 967 750 1234 | 7837 
Apr. 1948 814 1053 1358 862 718 1018 989 6812 
July 1948 709 1146 1190 875 756 689 1102 6467 
; 4565 6512 6374 4991 4324 4659 : 5328 36753 








Total 


reader will note that the totals of the rows in Table 1 are recorded in the first 
column of Table 3, and the totals of the columns in Table 1 are the entries in 
the first row of Table 4. The remaining entries in Tables 2, 3, 4 are obtained 
similarly. 

Formulas and computations for the analysis of variance will now be 
presented. The mathematical basis for the formulas will be sketched briefly 
in the mathematical appendix. The following symbolism will be needed for 
the formulas: 


X ;;, = the basic activity score resulting from a single observation session 
in which the ith partner is paired with the jth protagonist in the 
kth replication. 

For the study at hand 7 ranges from 1 through 7, 
j ranges from 1 through 7, 
k ranges from 1 through 5, 
7 ~ j for the same observation. 
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X;;. = the sum of all observations for the 7th partner and the jth protag- 
onist, the summation being made over all replications. This is an 
entry in Table 2. 

the sum of all activity scores for the 7th partner in the kth replication, 
the summation extending over all protagonists. This is an entry in 
Table 3. 

X .;, = the sum of all observations for the jth protagonist in the kth replica- 

tion. This is an entry in Table 4. 


= 
rl 


X;.. = the sum of all observations on the 7th partner for all protagonists 
and all replications. It is an entry in the column headed ‘‘Total”’ in 
Table 2. 

X.;. = the sum of all observations on the jth protagonist, the summation 


extending over all partners and all replications. It is an entry in the 
row headed ‘Total’ in Table 2 (also Table 4). 


X... = the sum of all observations in the kth replication. It is an entry in 
the row headed ‘‘Total” in Table 3. 
X... = the total of all observations in all replications. 
p = the number of animals in the study. 
r = the number of replications. 


In the following formulas SS is an abbreviation for sum of squares, 
MS for mean square, df for degrees of freedom, and F is the usual ratio of 
mean squares. For variation among partners: 


200 X...X.. 
rp(p — 2) 


ae a. 
rp(p — 1)(p— 2) rip — 1)(p — 2) 


en 22. + 
(1) SS = rp(p <8 2) p» Xj. + 








+‘ 
df = p — 1. 


For variation among protagonists: 














Pag 1 2 2 a 
oS = Ss 
rp(p — 2) XX. + rp(p — 2) 
ea bo 
rp(p — 1)\(p— 2) rp — Ip - 2) 
df = p-—1. 
For variation among replications: 
(3) p(p— 1) rp(p — 1) 


df=r-—1. 
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For partner X protagonist interaction: 
2 is Xii. a 2 
—~ X:. ar, 
r rp(p — z (DXi. + PX) 
si 2X Xs x. 
rp(p — 2) rp — 1)(p — 2) 





Ss = 





df = p’ — 3p + 1. 


For partner X replication interaction: 


rr, pe BP 





SS = 








a - + Dp ~ ie ~ 2) 
g2L VX @O- YUM. aX, 
(5) p(p — 2) m(p— 2) = rp(p — 1)(—p — 2) 
Zs 2 X..X.. . 





~ (p—Dip— 2) rp(p — 2) r(p — 1)(p — 2) 
df = (p — 1)(r — 1). 


For protagonist X replication interaction: 


Ep r. pp Be 





























” — 2) p(p — 1)(p — 2) 
+ 2 . - X aX sk wi (p nt 1) eM A ls >. i 
(6) p(p — 2) rp(p — 2) rp(p — 1)(p — 2) 
= oe _2 a ee 4 a. 
(p — 1)(p — 2) rp(p — 2) r(p — 1)\(p — 2) 
df = (p — 1)(r — 1). 
For error: 
1 72 A a= | r2 
SS = ae» 2, Xin r p(p — 2) zy by Xi 
Ss ] ro 2 > >. Xi nX sk p-i 72 
p(p — 2) LX p(p — 2) T io — 2) 2X. 
(7) 2 3 
‘ogee. + @- Do -2 
ai... a 
rp(p — 2) rip — 1)(p — 2) 


df = (r — 1)(p’ — 3p + 1). 
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The formulas will be applied to computations relating to the activity 
data of the study. The sum of squares >> >> >> X%;z is not available from the 
tables because the original scores are not shown. Additional sums of squares 
will be shown to help the reader, together with their source. 


> > d Xi. = 6,830,715, from original measures, 
bi 2. Xi;. = 33,060,491, from entries of Table 2, 
>> >> X?., = 39,164,035, from entries of Table 3, 
>> >> X?;, = 40,314,363, from entries of Table 4, 
> Zz X;..X 4, = 38,728,822, from entries of Tables 3 and 4, 
>> X%.. = 193,230,675, from row totals of Table 2, 
>> X?;. = 197,574,167, from column totals of Table 2, 
z, X? , = 272,866,547, from row totals of Table 4, 
> X;..X.,, = 192,148,558, from totals of Table 2, 
X?.. = 1,350,783,009. 


The analysis of variance is shown in Table 5. 

The analysis of variance in Table 5 is based on a model in which the 
components of a measure are constants, except for the error component. 
Consequently the conclusions are valid only for the seven animals in the 
study. 

It is more interesting to study a second model in which the seven animals 
are regarded as a random sample from a population of similar animals. This 
is a mixed model described in the mathematical appendix. All components 
involving animals are then treated as random variables. Only the general 
mean and the component due to the main effect of replications are constants 
in this model. Using this model the interactions are still tested as in Table 5, 
but the main effects require more complex tests. 

Tests for partner, or protagonist, main effects are made directly by the 
ratio of mean square of the main effect to the partner X protagonist inter- 
action. Thus the test for significance of differences among protagonists is 


F = 24,794.0/769.3 = 32.2, 


which is still significant at the 0.5 per cent level. 
The test for significance of variation among replications is more complex. 
Here 


— 2 
i, = —— (mean square of protagonist X replication interaction). 


a 2 
(8) y= _—. (mean square of partner X replication interaction). 


_ pi —M+2 


] mean square of error). 
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Then 
(9) p = mean square of replications | 
Yi t+ Y2— Ys 
This ratio has an F distribution (approximately) with r — 1 degrees of 


freedom in the numerator and 


(ys + y2 — Ys)” 





2 


Yi Y> Y3 
¢-De-)*t¢-)@-)* ¢—- De’ —3t+) 


degrees of freedom in the denominator. 

For the data in Table 5 the test for differences among replications 
indicates that F = 5.25, n, = 4, and n, = 26. This value of F also shows 
significance at the 0.5 per cent level. 

The treatment of the replication component as a constant in the second 
model calls for justification. This treatment was adopted because the replica- 
tions were spread over a long period, with the consequent changes which 
may be attributable to time. In fact the data show a fairly consistent decline 
in average activity over the replications. It would seem illogical, therefore, 
to treat the replications as a random sample from a population of replications. 
Even if the replications had been concentrated within a shorter period, one 
would hesitate to regard them as a random sample because of possible effects 
of order among the replications. 

The second model leads to a measure of the reliability of the mean score 
of a protagonist over all his partners at a single replication, in our symbolism 
X;./(p — 1). The formula is developed in the appendix but will be de- 
scribed here. 

Sample estimates of the variance components which arise from the 
second model are computed as follows. In the symbolism used, the sub- 
scripts a, b, c refer to partners, protagonists, and replications, respectively. 
A single subscript indicates a main effect or error. A pair of subscripts indicate 
an interaction. Using the mean squares derived from formulas (1) through (7), 


(10) % = 





(11) s, = MS error, 








. (r — 1)\(p — 1) . sb 2 
12 i. MS partner X replication —s,), 
(12) rp — 2) (MS } p 
aes 1)(p -— I) . . ; 2 
1 baa MS protagonist lication —s?), 
(13) 8} no -® ( protagonist X replication —s,), 
(14) = . (MS partner X protagonist — s¢), 


— 1 ' , 
15 bie oe (0 ‘t—-MS tagonist), 
(15) S; i ape (MS protagonist S partner X protagonist) 
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(16) s, = = 5 (MS partner—MS partner X protagonist). 


e Sah 
rp(p — 2) 

The reliability of the mean of a protagonist over all partners at a single 
replication is then 

(p — 1)s5 + 82 + 82 

@—)at+atat+o—- Da tat+e’ 
which is the correlation between mean scores for the same protagonists and 
partners at separate replications. From Table 5, 





(17) Replication reliability = 


TABLE 5 
Analysis of Variance of Activity Data 


with Second-Order Interaction as Error 














Source of Variation Sum of Squares Degrees of Freedom Mean Square F 
Partners 3,580 6 596.7 1.05 
Protagonists 148, 764 6 24, 794.0 43.57* 
Replications 64,522 4 16, 130.5 28. 34* 
Partner x Protagonist 
Interaction 22,311 29 769.3 1.35 
Partner x Replication 
Interaction 19,349 24 806. 2 1.42 
Protagonist x Replication 
Interaction 66, 289 24 2,762.0 4. 85* 
Error 66,012 116 569.1 - 
*significant at 0.5% level 
~ 3 ~ 
Ss, = 569, 
2 24 
Sse = ~zx (806 — 569) = 32 
ai 175 ( ) F 
9 24 . 
Sic = sae (2,762 — 569) = 301, 
175 
2 1 Cod id 
Six = 5 (769 — 569) = 40, 
2 6 A” a 
3 = 175 (24,794 — 769) = 824, 


6 
2 a (Fh aot a Bas 
a 175 (596 — 569) 1, 


6(824) + 1 + 40 


6(824) + 1 + 40 + 6(301) + 32 + 569 = .674. 





Replication reliability = 


In addition to replication reliability, there is also partner reliability. 
This measure of reliability is based on the mean score of one protagonist 
with one partner over all replications, X,;,/r. The resulting partner relia- 
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bility is then 
rs, 

rs, + 13; + rsa + 8,’ 
which is the correlation between the mean scores of protagonists computed 
over all replications but with differing partners. For the data in Table 5 
this reliability is .842. 

Where both replications and partners differ, but protagonists remain 
the same, the reliability is 





(18) Partner reliability = 


2 

ee 5 lie &; 

19 Partner-replication reliability = - 3 3° 
(19) P "sg tsits, ts. ts. +8 





For the data in Table 5 this reliability is .486. 


Mathematical Appendix 


It is the purpose of this appendix to state the mathematical basis for 


the formulas used in the paper. 
The sums of squares were derived by use of the following model: 


Xin = M+ a; + b; +e + (ab);; + (ac)in + (be) jn + Cie ; 1 # J, 
Xin = 0 
€;;, = a normally distributed variable with mean 0 and variance o” for 
all combinations of 7, 7, k. The remaining values m, a; , 5; , etc. are 
constants satisfying the conditions stated below. 
m = the mean of all observations, 
a; = the contribution of the 7th partner, 
b; = the contribution of the jth protagonist, 
c, = the contribution of the kth replication, 
(ab);; = the interaction of the 7th partner with the jth protagonist, 
(ac) ;, = the interaction of the 7th partner with the kth replication, 
(be) ;, = the interaction of the jth protagonist with the kth replication. 


Il 


The constants satisfy the following conditions: 
Dp 


2 a; = 0, + b; = 0, ar = 0, (ab);; = 0, % (ab),, = 0 
s=1 i=1 k=1 t=1 
> (ab);; = 0, (adi = 0, (ade = 0, D5 (be). = 0, DY (be); = 0 
j=1 i=1 k=1 j=1 k=1 


The determination of the sums of squares is carried out by least squares 
using methods described in several well-known references; see for example 
[1], [2] and [5]. A full derivation of formulas (1) through (7) is lengthy and 
will not be carried out here. However, a sketch of the method will be pre- 
sented. 

The likelihood ratio method described in sections 8.3 and 8.4 of [5], 
using particularly Theorem A, will be adopted. To apply this theorem set 





mT Re 








JOSEPH LEV AND ELAINE F. KINDER 11 


up the sum of squares 


U=  ® ie bs [Xie — m — a; — b; —& — (ab);; — (da — (bal. 
Then find values of the constants which minimize U, subject to the conditions 
on the constants. If the constants found in this way by least squares are 
distinguished by circumflexes as m, a, , etc., then 

nos = DD [Kin — m — a, — 6, — & — (@b),; — Ca — Coal 
is the error variance. 
To test a hypothesis that certain of the constants are zero, similar sums 


of squares are formed with these constants omitted. For example, to test the 
hypothesis that the a, are all zero, form the sum of squares 


U, = Zz ps a (Xi. — m — 6; —& — (ab); — Ga — (be) ;x]°. 
In terms of the least squares solutions, using the attendant conditions, 
nos = ps ie pa [Xin — m — b; = Ck — (ab);; — (ac); — (be). 
Finally the sum of squares for partners is given by 
no; — noo. 


The least squares estimates of the constants for substitution in nog 


























are 
ow 
rp(p — 1)’ 
_ (p — DX, +X; — X... 
: rp(p — 2) ; 
rp(p — 2) 
ee oe So 
*“ pp- 1) rp(p -— 1)’ 
, 4 xX 
gu) p(p es 2)X jj. ae (p — 1)(X,.. + X .;.) —X3.- X;.. ee 
b);; ” 5] 
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The derivation of nog is greatly simplified if one notes that it is equal to 
noo = > 2 »¥ X 5 sn[X ise al m rie a; sia b; is Cy ca (ab); ; ee (ac) ;x a (be) x]. 

The derivation of no; , when the a; are set equal to zero, leads to the 
following estimates of the constants: 


m = mM, Cs = Ce ) (ab);; = (ab); ’ 
(ac)in = (in, (Bein = (be)in 


but 
a + x 


= i@—-) m@—D) 





In deriving these estimates, the Lagrange multipliers, by which the 
conditions on the constants are introduced, play a very important part, 
and the mathematics becomes somewhat tricky. For example, obtaining the 


estimate (ab), ; involves the expression 
7. > > 3 [Xij, — m — 6; —& — (ab);; — QOa — (be) jx]° 
= 2 5 A;.. bm (ab) ;; = Z >» Ag: ie (ab); e 


Noting that m + b; = X.;./r(p — 1), and using the conditions on the 
remaining constants, then the least squares equations for evaluating the 
(ab),; are 


Xi. 
r(ab);; +A. FAG. = X ij. a p— 1 


The Lagrange multipliers are not zero here as they are in many of the 
other derivations, but they can be eliminated by considering the related 


equations 
=, 


r(ab);; $A. FAG. = Xu. - p— y 
Elimination of the Lagrange multipliers and evaluation of the (ab),; is 
accomplished by use of the following relations: 
DAs. + (p— Aj. = 0, 
— | X; X Ie ead 
mths. Sa, 2 > a 


+ );.. tp — DAL. = 0, 


ind 





? 





= _ -— , +Zz, - x... 
@—IM.. + Dds. = yt 
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and substitution of these relations, and the preceding ones, in the equations: 


Dw. FDA. = DN. HM. DG. BA. = 9, 


t=1 i=1 #7 ini 
and in the parallel equation: 
Do. BA. FD. HAG. = 0. 
iA¥d tj 
Considerations similar to the above lead to the remaining sums of squares‘ 

The reader should note that the sums of squares in formulas (1) through 
(7) do not total to the sum of squares of deviations from the grand mean of 
all observations. This is a characteristic of the nonorthogonal case in analysis 
of variance, of which the present analysis is an example. 

In order that the seven animals may be regarded as a random sample 
from a population of similar chimpanzees, the model is extended so that all 
components involving the animals will be variables which are normally and 
independently distributed with mean zero and variances to be defined. 
Using the notation V(X) for variance of X, we shall write 


Va)=o, V(b)=0, V[(ab)i;] = o% 
V[(ae) x] = Onc ’ V[(be) jx] = Cie ’ Vii) = Oe ; 


m and c, are constants as before. 
The following conditions are imposed on the components: 


> = 0, >> (ac), = 0 for each i, 
k k 

>> (be);, = 0 for each j. 

k 


The mean squares using this model are derived from formulas (1) through 
(7). The expected values of the mean squares are 


E(MS partners) = of + row + — o? 
E(MS protagonists) = of + row, + —— a 


2 
One + Tse) + Xu _ 


r—l 


E(MS replications) = of + Met ( 


E(MS partner X protagonist) = o? + ro, 





N . : a rp(p sti 2) 2 
E(MS partner X replication) o+ e- te- Db Pas 
rp(p— 2) 





E M ” . , . . = 3 
(MS protagonist X replication) = o; + ‘-te- 5 Tre 


E(MS error) 


2 
Cys 
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The formulas for testing main effects against interactions, (8) through 
(10), are derived from these expectations by methods described in [1, chap. 
28]. From these expectations the reliability formulas shown above can also 
be derived. The values of s° , s2. , sic , ete., are obtained by solving the expres- 
sions for the expectations for the variances oz , o2- , se , ete., successively, 
and then substituting sample values for the population values. 

To derive the formula for replication reliability, from the second model 


+» a; , (ab) ;; i (ac) ix zs Ciik 


. ae i t 1 1 ” 
a pm k ss ee | bi i ee | 











Consider now the mean for a protagonist at a parallel replication k’ 
so that 


D 4; 
. b; + Cy: 


Zz (ab) ;; 


p-—l1 





NS ivini 4. 2 
jae. y= 
De ii! 


2 (ac) :x i 
+ —— + (be) sx» + >-i 





A. 
By the model 


a 4 ik 7 Xi 4 
p(s) =m+c , r( ~-) =mt+C. 


Using these expectations and noting that X.;,/(p — 1) and X.;,./(p — 1) 


differ only in the terms containing k and k’ as subscripts, 


oe X ik Huss) = 2 oo Tap 
a, se eee OS ee ee | 











By the model the expectation of a square of a variable is its variance, 
whereas the expectation of a product of two different variables is zero. Also 


2 
Ce 


p—1 








y(2) sae y (2) es o: 4 o, Ha. on 4 2 4. Tse 4. 
p—1 p—1 a 2 Bee ee ee | 
The covariance and the variances jointly lead to the replication reliability 
(17). 
To obtain the formula for partner reliability, by the second model 


Ee as 


Ase =m + a, + 0; + (ag += —, 





since in summing over all replications the terms containing k vanish. 
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Similarly, for a different partner 


X43. 
r 


Wk ie + by + Des + fasten, 


Formula (18) follows readily. 
To obtain formula (19) correlate 
X sik = Tae + a; + b; + Ck + (ab), ; + (ac) «x + (be) jx + Ciik 


with the corresponding expression for X;,;,- , where i and k differ from ¢ 
and k’. 


Ul 
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ON THE RANKING PROBLEM 
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Observed rankings of objects can be treated as arising from a time 
dependent probability process. Under such circumstances, associations 
observed are an indication of the character of this underlying process. In 
the particular example treated in some detail here, a quantity related to 
Kendall’s tau is found to have an important role and its properties are 


examined. 


1. Introduction 


One of the frequent problems which research workers in the social 
sciences raise with statisticians involves relations between variates when 
the observations consist of rankings of the units which are observed. There 
are many procedures for testing the hypothesis that the rankings are inde- 
pendent in a probability sense. But the research worker has usually antici- 
pated this possibility—he is really interested in measuring the concordance 
of these rankings. On this point the measures hitherto proposed have not 
proved completely satisfactory. Speaking generally, they suffer in varying 
degrees from difficulties of interpretation, from stringent assumptions about 
underlying measurements, or from the lack of an adequate sampling theory. 

It is the purpose of this paper to suggest the usefulness of probability 
processes as the point of departure for some problems of this sort. Instead 
of viewing a particular ranking as one out of a hypothetically infinite set of 
random drawings of objects from some population, suppose that there is 
some process producing the possible rankings, and that each ranking has its 
probability determined by the character of the process. The observed rankings 
of the n objects then give some insight into the process which is producing 
the concordance. 

A particularly simple model will be used. Those who find it unrealistic 
and unreasonable as an approximation to their problems may try more 
complex ones. They are warned, however, that computational difficulties 
can easily become overwhelming with no corresponding gain in usefulness. 


2. Single Judge and a Standard Ranking: The Model 


There are a number of ways to formulate the problem. For convenience, 
it will be assumed that a judge is asked to rank a collection of n objects 
which have some natural order. These objects could be students who are 


17 
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being ranked on ability, houses which are to be ranked on liveability, auto- 
mobiles on beauty, and so on. The judge will generally begin with a tentative 
ranking and then compare objects whose rankings are close. He will adjust 
his rankings by comparison of close objects until satisfied or until required 
to report his ranking. 

In this model, it is assumed that at each comparison the judge takes 


only those (n — 1) pairs which differ by one rank and that he chooses one 
among these pairs at random, i.e., with probability (n — 1)~" for each pair. 


It is assumed that there is a preferred or standard order for the pair, and that 
the pair is put into this order with probability p,0 < p < 1, and in the reverse 


order with probability ¢q = 1 — p. This process is repeated over and over— 
selecting one among the (n — 1) adjacent pairs at random and assigning 


them the standard order with probability p without regard to the previous 
ordering. 

This is a Markov process. The essential probability characteristics which 
determine the shift from one ranking to another can be given by a transition 
matrix, each element in the transition matrix being the probability that a 
ranking in one order will be changed to another order at each stage of the 
process. 

Before writing this transition matrix, it will be convenient to group the 
n! orders into classes. The first class, S) , will have one ranking, the objects 
in a standard order. The second class, S, , will consist of the n — 1 possible 
rankings obtained by permuting adjacent objects in the ranking S, . If 
adjacent objects are then permuted in rankings in the class S, , either the 
member of S, or a new ordering is obtained. All such new orderings form the 
class S, . Similarly S, is formed by permuting adjacent objects in S, and 
taking those rankings not in S, ; S, is formed from S; , and so on. It can 
be shown that the number of such classes is [n(n — 1)/2] + 1. 

The transition matrix will be an n! by n! matrix in which the element 
in the 7th row and jth column will be the probability that the 7th of the n! 
possible rankings of n objects will be changed by the judge into the jth of 
the possible rankings. The matrix will be written so that the first row will 
give the probability that a judge, given the objects in the standard order 
(in class S,), will change to any other order or continue to use the ranking S, . 
The first column, on the other hand, will give the probability that he will 
move from any ranking into S, 

The second through the nth rows will give the probability of moving 
to any ranking from any one of the rankings in S, . The subscripts 7 = 2, --- ,n 
may be assigned in any arbitrary way to identify the rankings in S, ; cor- 
responding subscripts are used for columns 2 through n in order to give the 
probabilities of moving into the corresponding rankings in S, . The next 
rows belong to rankings in the class S, , and so on to the one inverse ranking 


in ee Ps. 
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The probability of moving from S, into any particular ranking in S, 
is g(n — 1)’. There is no possibility of moving into any other ranking in a 
single comparison, and all other transition probabilities in the first row, 
except the one in the first column, designated m,, , are zero. Since the sum 
of the elements in any row must be unity, m, = p. 

In comparing pairs when the ranking is in S, , there are three possibilities: 
(i) If the one pair which is not in the natural order is reversed, then the new 
ranking will be in S, . The probability of this is p(n — 1)’. (ii) If any other 
pair is reversed then the new ranking will be in S, . The probability of moving 
from one of the rankings in S, to any particular ranking in S, which can be 
reached from it by one permutation is q(n — 1)~’. (iii) There will be no 
change with probability [¢ + (n — 2)p] (n — 1)~’, since the row sum must 
total to unity and all other probabilities in the second through the nth rows 
must be zero. 

In general, the n! by n! probability matrix is partitioned into 
{n(n — 1)/2] + 1}° sub-matrices corresponding to the classes described 
above. All sub-matrices off the principal diagonal of sub-matrices by more 
than one class have zero elements since one permutation moves a ranking no 
more than one class at a time. 

From any ranking, call it the 7th, there are (n — 1) possible other rank- 
ings which can be reached by one permutation. There will be some number 
c; (0 < ¢; < n — 1) of these which lead to the class with a subscript which 
is one lower than that of the ith ranking. The probability of arriving at any 
particular one of these c; rankings is p(n — 1)”'. 


There will be n — c; — 1 possible rankings in the class with the next 
higher subscript than the 7th ranking. The probability of any particular 
one of these is qg(n — 1)~’. Finally, the probability that there will be no 


change in a single comparison is 


(2.1) mi: = [e.g + (n — cc; — 1)pl(n — 1)”. 


All other elements in the row corresponding to this ranking are zero. 
For example, for n = 3, 


Oe ee MN ee we, 
p/2: (q + p)/2 0 i 4/2 ae 
(2.2) Mp) =|P/2.)...9.. G@tm/2: 9 g/2 1 OF 
0: 7/2 0 i@+p/2 0 : @q/2 
, ae p21 9 (a + p)/2 | 4/2 
. 0 0 p/2 p/2 i gq 








where M(p) is the transition matrix whose probabilities are determined by p. 
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For p = 1/2 and all n, M(p) is symmetric. Furthermore, it is always 
possible after finitely many stages of the process—i.e., after finitely many 
comparisons—for any of the possible rankings to arise. Hence, from the 
theory of Markov chains [1] if a sufficiently large number of comparisons 
is to be made the probability of having any ranking will be the same as 
that for having any other ranking, i.e., (n!)~’. More precisely, when p = 1/2 
the limit of the probability that a judge will have the objects ranked in a 
particular way is (n!)~* no matter what the initial tentative ranking. 

When p # 1/2, the limit probabilities can be obtained from the theory 
of Markov chains by solving for ¢,; in 


ni 
(2.3) > tm; = t; ’ (j — i, 2, iat at , n!) 


i=1 


and 


n! 


ee I, 


i=-1 
where m,; is the element in the 7th row and jth column of M(p). (When p 
is zero or one there is a trivial special case which is excluded hereafter.) 
In particular, for 7 = 1, 


(2.4) pt, + tpn — 1) +--+ tpn —)"' =4,. 

Rewriting, 

(2.5) gh =(— 1) "ptt: +4), 

for which the relationship ¢,, = qgp~'t, is a solution when k, = 2, 3, --- , n. 


For j = 2, 3, --- , (n! — 1), the equations (2.3) can be typified by the 
one for j = 2. This equation is 


tig(n — 1) + tla + @ - 2)p\(n — ™ 


(2.6) 
+ tnsip(n me | a + ooo + ton-op(n = | i =. 


By straightforward substitution in (2.6), it is seen that 4, = pq™‘t, and 
t,, = gp t, are solutions, where k, = n + 1,n + 2, +--+ , 2n — 2. 

There are, in general, more than (n — 2) rankings in S, . It is clear, 
however, that solutions of (2.3) for any ranking in S, are ¢,, = q’pt, , 
where k, ranges over the rankings in S, . Similarly, for any ranking in S, 
t,, = ¢p t, are solutions to (2.3) and generally, t,, = q’p ‘t, , where k, 
ranges over the states in the class with subscript s. If the condition that 
>-71, t; = 1 is added, the above solutions are unique. 

The limit probabilities which are the solution to (2.3) determine the 
behavior of the observed rankings. It is this probability distribution (see 


Tables 1 and 2 below for n = 3 and 4) which is required for the empirical 
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study of the behavior of the process described previously. Things are simplified 
further if it is noted that, for fixed s and p, the limit probabilities of all mem- 
bers in the class S, , s = 0,1, 2, --- , n(n — 1)/2, are the same. That is, s is a 
sufficient statistic for p, and we can restrict ourselves to the distribution of s. 

When p = 1/2, this distribution is basically that of Kendall’s tau, since 
7 = 1 — 4sn™‘(n — 1)~*. The probability generating function for this case 
is essentially given by Kendall [2] and is 


F(a’) = @)"(1 + al +a4+2*)--- (tates $27) 
[mpl — a)" I ao Pier <2: 


(2.7) 


Hence, from (2.7) and the solutions to the equations like (2.6), the moment 
generating function of s is, forO0 < p < 1, 


(6) = Ee”) 
oo eS tee 





(2.8) i=1 l+rt+r+e-- +r 
-(t="/ 7 1 — (re’)i 
low! ji l= ”’ 


where | re’ | < 1, andr = gp”. 

Before proceeding further, it should be noted that the process discussed 
here is not the only process which gives limit distributions whose generating 
functions are (2.7) and (2.8). For example, any Markov process whose 
transition matrix had column totals equal to unity would give (2.7). Similarly, 
other simple processes lead to (2.8). (See for example [3].) Such processes 
have not been explored here because none seemed more reasonable simple 
approximations to useful situations than the one treated. And finally, as 
pointed out earlier, more complex distributions can also follow from other 
assumptions, but the exploration of these was beyond the purposes of this 


paper. 
3. Single Judge and Standard Ranking: Inference Problems 


Tests which involve 7 in the usual way have optimum properties. Thus, 
to test the null hypothesis that an observed ranking arose from a process in 
which there is no tendency toward conformity with a natural ranking, i.e., 
p = 1/2, against alternatives that there is a tendency toward conformity, 
p > 1/2, a uniformly most powerful test is to reject the null hypothesis 
whenever 7 is small. Similar remarks apply to invariant tests of p = 1/2 
against p ~ 1/2. 

The more interesting problems involve the estimation of the error, i.e., 
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the estimation of p. It will first be shown that 7 is not a satisfactory estimator 
of error when the process above is relevant. 
Using the moment generating function of s (2.8), 


0 log | 
00 |e-0 


nr Y 


l-—r j1l—-—r 


(8) = Po 
(3.1) 


i 








(We restrict ourselves to r < 1. The case r > 1 goes through in a parallel 
way, and the case r = 1 is treated by taking advantage of the continuity of 
the moment generating function.) Now, 





+ 
: E(r) = 1 —- ——~ E&). 
(3.2) (7) 1 = oa E(s) 
(3.2) can be bounded by 
7 —4r 4dr 
(3.3) ss lta (n — 1)\(1 — nr) + n(n — 1)(1 — r)* 
and 
, «iy 
(3.4) E(r) 1> -- 0 —P 
Hence, lim,.. /(7) = 1 and, since 7 must be less than unity, plim,_.7 = 1 
whenever r < 1. Similarly, it can be shown that when r > 1, plim,..7 = —1. 
When r = 1, we can obtain /(r) = 0 and compute the variance by using 
(2.8) and L’Hospital’s rule. It follows that plim,.. 7 = 0 whenever r = 1. 


Why 7; has the peculiar property of converging to only these three 
values is seen from (2.7) and (2.8). For r < 1, the distribution of s is that of 
a convolution of n independent truncated geometric distributions, each 
with parameter r. Hence, s/n will have the ordinary sort of behavior and, in 
particular, will have as its limit the normal distribution whose mean is 

a 7 / -1 
(3.5) E(s/n) = (1 — p)(2p — 1)", 
and whose variance is 
(3.6) var (s/n) = p(l — p)(2p — 1)-*n™. 

When r = 1, the distribution of s is that of the convolution of n inde- 
pendent chance variables, each having a point rectangular distribution, 
i.e., each chance variable x; has the distribution P{x; = j} = (1 + 72)7' for 
j = 0,1, 2, --- , 7. Hence, >>"., 2; = s must be divided by a term of order 
n’, as is done in computing +, if the mean of the limiting distribution is to 
be a bounded positive quantity. 

Since 7 is not a satifactory measure, attention is turned to estimation 
procedures for p and functions of p. The problem will be complicated by 
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the difficulty in writing the probability distribution for s explicitly. For 
small n, however, Table 16.3 in Kendall [1] can be used to write the exact 
probabilities, comparing various estimation procedures in this way, while 
for large n the asymptotic normality of s can be used. 

A reasonable requirement for an estimate of p when n is small might 
well be unbiasedness. The method of obtaining unbiased estimates can be 
illustrated for n = 3. The probability that s = 0, when r < 1, is 


(1 — 7)’ 


lig {s i 0} = ( aa r)(1 rs r’) 





to ; 

P {e = 1} = 2rt, 
P {gs = 2} = 2r°t,, 
P {eg = 3} = rt. 


Hence, if p% is the estimator used when s = 0, pt when s = 1, and so on, 
then from the definition of unbiasedness 


pity + 2ptrt, + 2Wprtrt + ptrt =p =(1+n". 
Equating coefficients of the like powers of r, 
po=1, pt=1/2, pt=1/2, pe =0. 
It is not difficult to show that these estimates are unbiased for all values of p. 
In general, let a, be the coefficient of x* in the polynomial (1 + x)(1 + 


ata’)---(l+a+--- +2") and b, be the coefficient of x* in (1 + x + 2°) 
(tata? +e*)---(+x4+---+2""'). Then the unbiased estimates are 





p= - 
(3.7) si 
b, asa 
at ee Ke writing b_, = 0. 
s-l 8 


This estimation procedure has the property of consistency but as n 
becomes large the difficulties in obtaining b, are the same as those for a, . 
A somewhat easier consistent estimate might, therefore, be used. One such 
estimate is 





; P — =a / -. _ es. 
(a) when es I/tandn >2, Bw = 75-9 —o 
ee ee : cist n 
(3.8) (b) when a = 1) > 1/4andn>2, pou = 1 n+ Os — 3° 
(c) when ee 1/4 o n=2, py) = 1/2; 


n(n — 1) 
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where 


n(n — 1) 
—— 


s* = 


This estimation procedure is based on the fact that s has, approximately, 
a negative binomial distribution and (a) and (b) are essentially maximum 
likelihood estimates when r < 1 and r > 1, respectively, with a correction 
to reduce bias for small n. Since, when n is large, the conditions for use of 
(a) and (b) hold with probability arbitrarily close to unity the estimates are 
consistent for r ¥ 1. 

For r = 1, (a) and (b) will be used with equal probability. If (a) is used, 
then the conditional probability 


: = aa. 
(3.9) PAlp | <tot<tho 
since 


n(n — 1) 
4 


n(n — 1)(2n + 5) 
72 





E(s) = and _ var (s) = 
A similar relation holds for (b), and together with (c), proves consistency. 

The efficiency of these estimates is unknown. The probability distribu- 
tigns of s for n = 3 and n = 4 are given below for three values of p together 
with the estimates p* and their variance. (Our estimate # coincides with p* 
for these values of 7.) 

When p + 1/2 and n is large, the estimator is approximately normally 
distributed and it can be shown that 


(3.10) var (p) ~ p(1 — p)(2p — 1)’/n. 


TABLE 1 


Probability Distribution of s 


and Estimate of p for n = 3 


Pt}: Gl 











s pt =p p= 1/2 p = 2/3 p = 3/4 
0 1,0 .167 .381 519 
1 0.5 .333 .381 346 
2 0.5 .333 .190 W115 
3 0.0 .167 .048 .019 
Var (p*) .084 .075 .072 
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TABLE 2 


Probability Distribution of s 
and Estimate of p for n = 4 











P(} = PEP 

s pt = p=1/2 p = 2/3 p = 3/4 
0 1.00 042 203 350 
1 0.67 ~i25 305 350 
2 0.60 .208 .254 195 
3 0.50 ~250 152 .078 
4 0.40 .208 064 022 
5 0.33 ~h25 .019 .004 
6 0.00 042 003 .000 

Var (p*) .032 2033 .038 





This approximation is useful only when n is at least moderately large. Bias 
seems to be relatively unimportant. For n = 6 and p = 2/3, for example, 
there is an upward bias of .016. 

When p = 1/2 there is no bias. In addition, it can be shown that when n 
is large p is approximately 1/2 + 1/n. 


4. Several Processes 


Now consider the case where there are several processes, each with the 
same standard ranking. In practice, these processes may arise from m judges 
attempting to rank n students on intelligence, n cities which have m different 
indices of their development, and so on. The standard ranking is unknown 
to the statistician. He is asked to estimate this ranking and to give some 
indication of the usefulness of the judges or indices for ranking purposes. 

Each of the m estimates of the standard ranking is given independently 
and assumed to be derived by a process which gives the probability distri- 
bution of section 2. The probability of any particular set of m rankings is, 
therefore, 

(4.1) Pleu,JP fen) << Pema} = TE TT SS 22, <0 
mmf 

where P{s,,,} is the probability that the ranking arising from process 

a (¢ = 1, 2, --- , m), for which r, is the error parameter of the 7th judge, is 

in the class which is s;,, permutations from the standard ranking. 
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In particular, ifr, = r, = --- =7r, =7, 

m 1 eubi 
(4.2) P{ 8;,;} = TI; a ae", 
t=1 i=1 ae ’) 


where s = 8,,, + 8,, + °** + 8,,, With the restriction that r < 1. 

A maximum likelihood estimate of the standard ranking is obtained if 
a ranking which minimizes the total number of permutations from the 
estimated standard order is chosen. This is very simple in practice. For 
each process, assign rank | to the “best” object, rank 2 to the next, and so 
on to rank n for the last. Obtain the total (or average) score for each object 
over all processes. A maximum likelihood estimate of the natural order 
is given by assigning Ist rank to the object with the lowest score, 2nd to 
the object with next lowest score, and so on to the nth rank for the object 
with the highest total score. In the event that there are ties, this application 
of the method of maximum likelihood does not give a unique answer, and a 
randomization procedure which gives equal probability among the possible 
rankings of tied objects seems reasonable. 

It is clear that, for fixed n, the standard ranking will tend to be chosen 
with probability one as m increases. For small m, however, the probability 
that a particular ranking will be chosen as the standard ranking does not 
seem to be easily obtainable in general. 

For example, for n = 3 and m = 2, the probability of a correct ranking is 
(1 +r) + r+ -r°)]', the probability of either of the rankings one per- 
mutation from the standard order is r(1 + 2r)[(1 + r)\(1 + r + r*)*]"', the 
probability of either of the two rankings two permutations removed is 
r (r+ 2) (1 + r)(1 +r +,7°)*]', while the inverse ranking has probability 
{1 +r)\(1 +r+r°)]‘ of being the estimate. There is little gain over 
using only one process to estimate the standard ranking. For then the prob- 
ability of correct and inverse ranking is the same as above for m = 2 while 
that of either ranking one permutation removed is r(1 + r)-'(1 + r + 7°)7. 
This peculiarity seems to arise in large part, however, from the excessive 
number of ties possible in this special case. 

In the estimation of p, the methods of the previous section are also 
relevant. That is, a consistent estimate of p as n becomes large is given by 


3 s m 

(4.3) 5 = aera Se , When or a | iD <j 2 ; 
and an appropriate modification is made otherwise. When n is small, however, 
it seems reasonable to consider only the unbiased estimates (3.7), to compute 
this estimate for each process, and then to average. Clearly, wach an estimate 
is consistent in m and unbiased. 

In the cases considered so far the association between the processes 
arose because there was a common standard ranking. The apparent differ- 
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ences in the observed rankings arose from fluctuations around this common 
standard. The number of permutations from the estimated standard ranking 
measured the tendency for the observed rankings to agree. 

The comparison of several processes which do not have the same error 
or processes which do not have the same standard ranking involve more 
difficult problems which will not be treated in this paper. As rough practical 
guides, however, weighting the observed rankings in order to get better 
estimates of the standard order would in the first problem be an improvement 
over the unweighted scheme used earlier. Each ranking would lead to its 
own estimate of error. 

If the standard ranking is not the same for all m processes, this would 
lead to greater disagreement among the observed rankings than if there 

yas a common standard. The pooled estimate of p would then imply greater 
variability among the separate values of s/n for each process than is the case. 
One might, therefore, reject the hypothesis of common standard ranking 


when >-”, (s;,,; — 3)’/var(s,) is too small, using a chi square distribution 
with m — 1 degrees of freedom as a first approximation. 
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ESTIMATION OF ERROR VARIANCES OF TEST SCORES 


Joun A. Keats* 


_ EDUCATIONAL TESTING SERVICE 


The representation of test scores as n-dimensional points leads directly 
to an estimate of error variance at a particular score level in the case of 
equivalent items. Approximations are suggested for the case of non-equivalent 
items. These approximations are compared, with satisfactory results, with em- 
pirical data prepared by Dr. Mollenkopf. 


General Representation of Test Scores 


It is convenient for present purposes to consider the possible responses 
of a person to a test of n items as points in a space of n dimensions, 


A = {a, a, a3 --: a,}, 
when a; = 0 or a; = 1. If B is another point, i.e., 
B = {b, by bs «+> b,}, 
then define distance from A to B, i.e., D.4,) as 
(apy = [(a, — 0)? + @ — By)? +++ @, — ,)*] = Dis. - 


In particular, the score S, corresponding to A is given by Sy = Dis.) = 
‘o,4) = the squares of the distance of A from the origin. 

It is also convenient to consider the arrangement in the n-dimensional 

space of points having the same score value. These arrangements will form 


regular (n — 1)-dimensional figures with (") vertices and with centers at 


Ct as 888 8 
eI nnn n)° 


Since all patterns with the same score are commonly treated as equivalent, 
and since C, is a type of average of such patterns, it is interesting to note 
that D.c,.o) = s/Vn. This follows from the fact that D?¢,5) = sum of 
squares of coordinates of C, = n(s’/n”) = s’/n, i. e., the score values corre- 
sponding to the centers of these figures mark off equal distances in the pos- 
sible range 0 to n. 

The next step is to consider the probability of occurrence of patterns. 
If all patterns are equally probable, then the distribution of scores will be 


the point 


* Now at the University of Queensland, Brisbane, Queensland, Australia. 
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: ee \ , i a R 
the binomial distribution P(s) = ( )e ". This case is trivial, as it corresponds 
8 


to zero reliability. Of more importance is the case of equal probability of 
patterns for given score values, with no other restriction on the total prob- 
ability for a particular score. In this case all items are equivalent statistically. 
This case will be examined further with respect to a problem raised by 
Mollenkopf [4]. 


The Variation of the Standard Error of Measurement 


In his 1949 article Mollenkopf [4] studied the variation of the standard 
error of measurement with test score by considering the variance of the scores 
on a half test for persons who have the same score on the total test. Mollen- 
kopf’s demonstration [4, p. 191] that this variance is in fact related to the 
average error variance for persons with the same raw score is inadequate, 
as is Rulon’s [6]. In both cases the inadequacy arises from the fact that the 
demonstrations deal with the average error variance irrespective of raw score. 
However if error variance varies with raw score, what is true on the average 
for all raw scores may not be true for each individual raw score. 

Let s; be the score of individual j, and the scores on two parallel halves 
of the test be s;, and s;, . Then 


8; = 8), + 82 
= ty tei + tie + ej2, 
where ¢ stands for true score and e for error. 
If individual 7 is retested many times with parallel tests, 
var (s;) = var (e;, + e;2), 


since ¢;, = tj. = constant. 
This is equal to error variance in the usual sense. But 


var (8;, + 82) = 2 var (s;,). 


Var (s;,) may be estimated with one degree of freedom from one applica- 
tion of the test for each person. These variances may be averaged over 
persons at the same score level. This procedure leads to Mollenkopf’s estimate 
of error variance, which is thus an estimate of error variance in the usual sense. 

Consider now the case of equal probability of patterns for given score 


; n ; : ; 
values. If the ( ) points corresponding to score s are further ordered in terms 
s 
of their distance from the point P = (P,P, --- P,,. +++ P,), where P, to 
P,,2 are all 1 (n is taken as even) and P,,,:,,, to P, are all 0, i.e., sp = n/2, 


: ae : n 
then the frequency distribution of the (" 
8 


) points in terms of the number (r) 
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of unities in the first n/2 coordinates will be 
Te 7 ee 
r/\s—r r /\n-s-r 
(") bea 
8 n—s 


Furthermore, this will be a conditional] distribution in the bivariate distribu- 
tion obtained by plotting the score r on the first n/2 items against score 
s — ron the second n/2 items. The condition is that total score is kept 
constant. The proof of these statements is given below. 

Both of the distributions given have variances [1, p. 127] 


if s <n/2, and if s>n/2. 





1ls(n — 8) 


and, of course, this must equal 
ks (8:1 — 82)” 
(2) — 





where s;, + s;. = s, and N, is the frequency of score s, since >; (s;, — 8j2)? = 
>, (2s;, — s)? = 4 3; (r; — 8/2)’ and since the items are equivalent 
F; = s/2. Hence (2) will equal (1) when N, is arbitrarily large. The relation 
which Mollenkopf wanted to obtain was 


be (8:1 8;2)" 
1 ——— 





(3) N. 
as a function of s = s, + 8s, . This is provided by the formula 
(4) error variance = ae 


for the case of equivalent items. 
Proof of Distribution 
Consider 
P = (P, = 1, P, = 1, P; = 1, +++ Pay = 1, Pasar = 0, --> P, = O) 


and patterns for score s < n/2. Patterns which have the same number (r) 
of ones in the first n/2 elements will clearly have the same number (s — r) 
of ones in the second n/2 elements and thus be the same distance from P. 


There will be 
("/?).( n/2 ) 
r s—r 
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such patterns and thus the probability of a pattern with score s falling into 


this rth category is 
(n/?\( n/2 ) 
r/\s-—r 


’ 

n 
(() 
since all such patterns are assumed equally probable. 

Clearly this rth category contains all patterns corresponding to a score 
r on the first half of the test and a score s —_r on the second half and so repre- 
sents a cell in the bivariate distribution of the first half of the test against the 
second. The distribution P(r) is the conditional distribution obtained by 
holding total score constant. 

A similar proof holds for s > n/2, e.g., by considering the number of 
zeros instead of the number of ones. 


PQ) = 


Properties of the Formula (4) for Error Variance 


From (3) and (4) 





s(n — 8) » (8:1 — 82)” 
ies ii N, 2 


To obtain the average error variance Y it is necessary to multiply by 
N, , sum with respect to s, and divide by the total number of people (NV). Then 





s=1 


a 7 1 = n . - _ a n _ ‘ 
Y= N(n a 1) pm N 8(n 8) ead N LX > 7 82) 
_m—o-F 


<a f = 207,(1 — Tis). 


The extreme right-hand side can be shown to equal o?(1 — r) when r is the 
correlation between the total test and a parallel test of the same length, i.e., 


> _ Sn — 8) a; 
~ m—1 m-}d 


— ( s(n — ) 
=——(1-~“ 5). 
n—1 NO, 
This result is the Kuder-Richardson Formula 21 [2]. 

Lord [3], in his treatment of the error obtained from sampling items, 
arrived at an estimate of variance due to this error equal to s(n — s)/n if 
n is sufficiently large so that s/n is a good estimate of true score. The present 
formula can thus be regarded as a small sample estimate corresponding to 
his formula. 





> o(1 = r), 


from which 
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The formula 
— s(n — 8) 

n-1 
indicates that the error variance at a particular score level can be evaluated 
without using the reliability coefficient. This is an important result as it 
demonstrates that although the reliability coefficient varies from population 
to population for the same test, the error variance at a given score level 
remains constant. 

The Effect of Non-equivalence 

Whereas the formula developed is exact under the conditions stated, 

no exact statement can be made in general. However, an obvious adjustment 


can be made in the formula so that its average value will equal the average 
error variance for some typical group. Thus for practical purposes 








_ (1—Rsn—s3) ,sn—s) 

ee es ees ee 
since 

‘4 | ea. 2 J han 

<7. ete > =o =m), 


where R = reliability coefficient for some typical group, 
K = the estimate from Kuder-Richardson formula 21 for the same 
group. 

This average correction of Y cannot be expected to give good estimates 
of error variance in cases of extreme variation in item difficulty or of con- 
siderable heterogeneity of items. Its value can best be assessed by a com- 
parison with data such as that used by Mollenkopf [4] and [5]. The graphs 
(Figures 1-13 below) indicate the degree to which adequate representation 
is obtained. For a more adequate description of the data see [4] and [5]. 

The points (X) indicate the mean of the squared differences for five-unit 
intervals of raw score. The solid line is the curve obtained from the theo- 
retical formula. The curved broken line (---) is the second-degree curve 
of best fit presented by Mollenkopf, and the straight broken line (—-—-) is 
the straight line obtained from the assumption that error variance is constant 
at all points of raw score. For all figures the values of the mean raw score 
(M), the standard deviation of raw scores (c), the number of items (NV) and 
the corrected split-half reliability (R) are included for the data on which the 
curves are based. 

Figures 1-4 are for multiple-choice data and the corrected formula 
seems to give a reasonable representation here. Mollenkopf’s formula is 
“not nearly as good representation of the error trend as the best fitting second 
degree curve” [5, p. 5]. The corrected formula gives a progressively worse 
fit as variation of item difficulty increases, as shown by cases 5-13 which 
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are arranged in order of increasing variation of item difficulty. The extent 
of variation of item difficulty is given in Table 6, page 218, of [4]. 

In case 13, in fact, the parabola has a minimum, not a maximum, and 
this could not be obtained by the corrected formula. The theoretical formula- 
tion given above could produce a minimum point by proceeding to a closer 
approximation to the general case, which leads to the formula 


where var (p,) is the variance of item difficulties for persons at score s. This 
formula was obtained by analogy with the formula for the variance of the 
distribution obtained by sampling from a binomial distribution with unequal 
probabilities as given by Kendall [1, p. 122]. It is not particularly useful as 
the computation of var (p,) is tedious. Fortunately, very few tests used in 
practice have sufficient spread of item difficulty to require these computations. 
From the point of view of sampling of items this formula would be appropriate 
for stratified sampling, and the coarser approximation using k could be taken 
as an approximation to this case when the variance of item difficulties is small. 





Applications 


The expression s(n — s)/(n — 1) can always be used in the absence of 
group data if the assumptions made by Lord [3] in his estimates of error 
variance are considered reasonable, i.e., if the test is considered a result of 
random sampling from a population of items. 

If group data are available, then k may be estimated and the formula 
used to estimate error variance in the usual sense at a particular score level. 
As k is used as a correction factor, it might be expected that it will be fairly 
stable across groups for the same test. Figures 1 and 2 refer to the same test 
given to two groups. It will be noticed that the two theoretical curves are 
more stable than the two values of o7(1 — R). 
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THE DETAILED METHOD OF OPTIMAL REGIONS* 
Paut S. DwyER 


UNIVERSITY OF MICHIGAN 


The detailed method of optimal regions is an extended form of the 
method of optimal regions which has been found effective in solving the 
— classification problem when the number of job categories is small. 

he automatic determination of the successive values of the v;, made possible 
by the more exact techniques of the detailed method, provide easier solutions 
for the more complex problems and provide solutions, which, for the most 
part, can be mechanized. In a sense the detailed method of optimal regions 
is more than a detailed form of the method of optimal regions. It is essentially 
a method of transformations by which the original matrix is reduced to a 
matrix from which the solution is easily obtained. 


1. Introduction 


The personnel classification problem [1] deals with the assignment of 
individuals to jobs, where the contribution to the common effort of each 
individual 7 if he is placed in position 7 is the known quantity, c;; . Two 
recommended methods of solution are the simplex method [3] and the method 
of optimal regions [2]. The reader is referred to these references for the state- 
ment of the problem, the derivation of important properties, and descriptions 
of methods of solution. 

The method of optimal regions is especially effective when, as is common, 
the number of different positions, k, is small. The method is based on the 
determination of a constant, v; , for each position. In the detailed method 
of optimal regions, more specific rules are given for determining the v; . Since 
these rules demand the calculation of auxiliary matrices, the detailed method 
is especially effective with machines, but it is also recommended when non- 
trivial problems are to be worked by hand. 

Let the number of individuals to be assigned to the k positions be N, and 
let c;; be entries in a matrix with N rows and k columns. The quota, q; , the 
number of men to be assigned to position j, is exhibited in a row at the top 
of the matrix. This matrix is illustrated in Table 1, where ten men are to be 
assigned to four positions with quotas 4, 1, 4, 1, respectively. The problem 
is to make the assignment so that the sum of the corresponding c,; values is 
as large as possible. 


*Much of the basic research covered in this paper was carried out while the author 
was working on the problem of personnel classification in his capacity as Consultant, 
Personnel Research Branch, Department of the Army. The author wishes to express his 
appreciation to the Department of the Army for permission to use these materials in this 
paper. The opinions expressed are those of the author and are not to be construed as official 
or as those of the Department of the Army. 
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2. The Conditions of Solution 


The basic conditions of solution, fundamental to the simplex method and 
other methods as well as to the method of optimal regions, imply the existence 


of u; and 2; [2, p. 20] such that 
2.1) ¢;; = u; + v; for assigned values, 
Ci; < u; + v; for unassigned values. 


If J, denotes the position to which individual 7 is assigned, the first expression 


of (2.1) is 
(2.2) Cia, = UH 05,. 


Subtracting the second expression of (2.1) from (2.2) gives a (necessary) 
condition for solution: 


(2.3) Cis; = ys * Ci; — U;. 


(2.3) may be called the generalized Brogden condition [2, pp. 20-21]. The 
method of optimal regions is based on the v; of (2.3). The detailed method of 
optimal regions also uses the u, of (2.2). 


3. The Determination of the Initial v; 


Given the values c;; and the quotas gq; , the first step of the detailed 
method of optimal regions (and of the method of optimal regions) is the 
determination of v;°’, the initial values of v; . Count out the q; largest values 
in each column j and take the smallest of them. In Table 1, this process leads 
to the values, v,° = 29, v;° = 49, v§° = 27, vf = 41. Thene,; — of >0 
for at least g; elements in column 7. 

In problems worked by hand, it is commonly useful to indicate those 
values which are equal to or greater than the v;°’. Asterisks have been used 
to indicate those values. 

The v{° may be determined with the use of punched cards. One card is 
punched for each individual, indicating the ¢;; values for all positions. The 
cards are then sorted for each position and the v{” determined from the sorter 
card count or from a tabulator run using cumulated frequencies. 


4. The Determination of the Initial Assignment and the uS” Values 


The initial assignment, /{°’, is then made with the use of (2.3). Thus, 
in Table 1, compare c;; — v;° for successive values of j for each 7 and make 
the initial assignment /{° to that column for which c,; — v{° is largest. 
Individual | is initially assigned to job category 1 since —6 is greater than 
—36, —11, or —27. In case of a tie for the largest value of ¢;; — v$, both 
values of j are recorded in the column for J{°. 
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With hand methods, many of the assignments can be made with a simpler 
rule. If there is a single asterisk in a given row, the assignment is made to the 
column in which the asterisk appears. If there are two or more asterisks in 
a given row, only those columns with asterisks need be considered in applying 
the criterion. 

The number of initial assignments to each position is then determined. 
This number is indicated by g{° and is placed, for comparison, above the q; 


values. The first number indicates the number of definite initial assignments 
and the second the number of ties. Then form g;° — q, , which indicates an 
excess of assignments if positive and a deficiency if negative. In Table 1 
there is an excess of two assignments in column 1 and deficiencies of single 
assignments in columns 2 and 4. 

Next determine the u{” values. From (2.2) 


(4.1) u; = Cis; = US, , 


and then u;”) = c¢;3, — vf? . The results are placed in the column labelled 


u,”. In practice, it is aay convenient to determine the values of wu; 
inal is with the values of J; . 


5. The Determination of the First Transformed Matrix 
The first transformed matrix is computed using the formula 
(5.1) ec) =c¢,, — 4 — 9 


Every element is either zero or negative since u{” and v$” are determined 
so that c;; < us? + v{. The values c‘; resulting from the application of 
(5.1) to the problem of Table 1 are en: 3 in Table 2. 

The negative signs in this matrix (and the following ones) can be elimi- 
nated by using the alternative transformation 


(5.2) i? = ef? = uy” +07” — 4; . 

This is illustrated in Table 3. The value of v;” is then the value of the q,th 
smallest t{? in column j. The values of J{” are then determined using 
(5.3) Bus — VF re) << cq? == ee 


as illustrated in Table 3. The values of /$° are indicated by the zero values 
of ¢{?. The summary values q§° and g‘”’ are recorded in the top rows. Ex- 
amination shows that the transformation process is not yet completed since 
there is an excess of at least one assignment in position 1. Hence an additional 
transformation is carried out. 


6. The Determination of Successive Transformations 


Since the values of v{” are available, only the values u{” are needed to 
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complete the transformation. Now 


2 1 (1 qa 
(6.1) us = Cy a) = VF ’ 
and the next transformation is given by 

(2) (1) (1 (1) 
(6.2) = i; — uy’ —o;”. 


The application of this transformation to the matrix of Table 3 leads to the 
matrix of Table 4. The symbol @ is used for each of the zero terms appearing 
in the same row. Thus the ties of Table 3 are indicated by the 6’s of Table 4. 

The values of g{"’ show an excess of at least 1 in column 1. Hence one 
of the men tentatively assigned to column 1 must be assigned to one of the 
other columns. This is accomplished by subtracting from column 1 the 
smallest non-zero entry in any of the rows corresponding to individuals 
tentatively assigned to position 1. In Table 4, this value is 2; so vj? = —2. 
The remaining values of v;”’ are 0, but they need not be recorded since nothing 


i 
is to be subtracted. 

The values of J{” are then determined and the summary q;”) values. 
There are no excesses or deficiencies indicated either in the single columns 
or in the combinations of columns. The obvious assignment of ties leads to 
the set of J; values identifying the solution. 

In some cases it is necessary to make transformations on combinations 
of columns, since the method leads to a solution only when every combination 
of columns, as well as each column separately, has no deficiency [4, p. 16]. 
The technique for finding a suitable transformation when there is a de- 
ficiency in several columns differs slightly from that described above. In 
Table 3, note that an excess in column 1 indicates a deficiency in colimns 
2, 3 and 4. Indeed, a summary of the /‘"? column shows only five men with 
0 in columns 2, 3 or 4. But gq. + qs + q4 = 6. Hence there is a deficiency of 
1 in this subset. A common positive amount can be subtracted from each of 
these columns to introduce an additional term, provided the negative of 
this amount is subtracted from every row which has at least one zero term 
in columns 2, 3, 4. In this way the tentative assignments to the columns 
having a net deficiency is maintained, while adding at least one new assign- 
ment to these columns. The amount to subtract from the columns is the 
smallest (non-zero) number in those columns which is not in a row tentatively 
assigned to column 2, column 3 or column 4. In this way the transformation 
leads to a matrix having the desired property that every element is non- 
negative. 

In Table 4, t{3? = 2; so v$?) = vj? = vf? = 2 with v;” = 0. These values 
of v\” lead to values of J‘” which are identical with those of Table 4. The 
two transformations are essentially equivalent transformations since they 
lead to the same matrix. This is the ¢‘? matrix of Table 5. Assignments 


el 


satisfying the quotas can be made to the zero terms of this matrix. 
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The method is designed, at each step, to decrease the number of de- 
ficiencies in some particular column or combination of columns without 
increasing the number of deficiencies in the remaining columns. The method 
necessarily converges since the total number of deficiencies is finite and 
since a sufficient condition for solution is an assignment with no deficiencies 
in any column or combination of columns [4, p. 16]. The process converges 
very rapidly in the common case in which the number of job categories is 
small. Experience has led to the empirical conclusion that, for small values 
of k, the number of transformations required for solution is approximately 
the value of k. Once the row deviates described in the next section are avail- 
able, the number of transformations required is commonly less than k/2. 


7. The Use of Row Deviates 


A device which is useful in speeding the convergence of the method is 
the use of row deviates. Any constant may be subtracted from any row 
without changing the solution since (2.3) is not changed by subtracting a 
constant from c;,;; and from c;;. Subtraction of the mean of the row from 
each element in the row results in row deviates from the mean. Preferably 
one may use large row deviates defined by ' 


k 

(7.1) Ci; = ke; — &) = ke; — D i = ke;; — ¢;., 
i=l 

where c,; . and é; are, respectively, the sum and mean for row 2. 

The matrix of row deviates is then treated by the method described 
above. In the illustration used above the values of U{° and V;° obtained 
from the C;; matrix are almost adequate for determining the solutions. This 
is shown in Table 6. Only a slight additional adjustment is necessary in 
column 4. The advantage of the use of the large row deviate transformation 
may be seen from the fact that the columns of the C;; matrix are generally 
uncorrelated or slightly negatively correlated so that large values in one 
column are not apt to be accompanied by large values in some other column. 
The values of J; in Table 6 are identical with those of Table 4. 


8. Solution of a Problem in the Frequency Form 


An illustration is next presented with k = 5 and in which it is necessary 
to analyze subsets of columns even though (large) deviates are used. For 
this purpose a frequency-form problem which Votaw and Dailey [4, p. 7] 
have worked with the simplex method is examined. A frequency-form problem 
results from the grouping of individual categories so that frequencies (f;) 
as well as quotas (q;) appear. The number of personnel categories is n. 

The n = 4 values of f; , as well as the k = 5 values of q; , are shown in 
the first matrix of Table 7. The values of c;. are first computed and then the 
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values C;; are recorded in the second matrix. In determining the values 
v:° consider the frequencies associated with each row. Thus Vj” = —1, 
since the 12 + 23 values of —1 in column 1 are more than ample for the 
quota of 15. The values of J{° are then obtained with the generalized Brogden 
condition. It is at once apparent that the columnar quotas can be met in- 
dividually but that there is a deficiency in the subset of columns 1, 2, 5, 
since the 12 + 23 = 35 men available cannot fill the 15 + 20 + 12 = 47 
jobs. A transformation is in order. 

The values, US’ = 0, are computed and then the values 7; appearing 
in the third matrix. The values of J‘° summarize the zero terms. The de- 
ficiency in the subset consisting of columns 1, 2, 5 can be met after the matrix 
is reduced by subtracting some quantity from each of these columns to 
admit more zeros in the columns. The quantity to be subtracted is the smaliest 
non-zero quantity in the rows not tentatively assigned to columns 1, 2, or 5. 
This quantity is 1;so Vi?) = V{? = V{ = 1 and, of course, Vi? = V{” = 0. 

The values J‘” are then determined. The number of available assign- 
ments in each row is so large that assignments satisfying the frequencies 
and quotas can be met in many different ways. 

The additional transformation indicated by the values of V{”’ is made so 
that the 7;; matrix results. This transformation is not necessary to the 
solution, since a solution can be obtained from the last column of the third 
matrix, but the solution may also be obtained by making assignments to the 
zero terms of the last matrix in any way so as to satisfy the quotas and 
frequencies. 


9. The Determination of u; and v; 
7] 


It is-now possible to determine the values of u; and v; of (2.1). If 
t;; = ¢t{” represents the final transform, let 
bes — Si lé 5 
(9.1) :; = 0 for assigned values, 
t;; > 0 for unassigned values. 


Consider first the case in which the transformations are applied to the ¢;; 
matrix without using row deviates. Then 


(9.2) ti; = us? +0; — 6; — (us? +03? + --- + uf? + vf”). 


(0) (1) (2) (m) 
u; = U; — U; = By = SS = 
(9.3) v v 1 + 1 ’ 
(m) 
i 


(0) (1) (2) 
;-% -3 = ~# 


0; =v j 
The values of wu; and v; for the problem of Table 1 were computed using 
(9.3) and are shown in the last column and row of Table 1. 

The determination of u; and v; for problems using the large row deviate 


transformation is more involved. If the u; and v; appropriate to the C;; 
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matrix are U; and V; , a set of non-negative values of v; can be determined 
from 


does (V; ie: V;.)/k, 


where V;, is the smallest V; . Thus in Table 6, the values of V; are 19, 66, 
—1, 43; so the values of v; are 5, 16 3/4, 0, 11. Again, in Table 7, the values of 
V; are —2, 3, 8, 8, 3; so the values of v; are 0, 1, 2, 2, 1. Other sets of »; can 
be obtained by adding constants. 


10. The Determination of the Assignment Sum 


The assignment sum can be determined by applying the assignments 
for each row to the original c;; matrix. This is illustrated in Table 5; the 
values of c;,, are listed for each 7, and the sum is 315 units. An alternative 
method is based on the formula 


(10.1) Dies. = (Lui? + Daw?) — (Liu? + Do aw?) — --- 
ae” > Le) 


The values in parentheses are given in the lower right corner of the respective 
matrices. If a problem in the frequency form is used, the values of }\u‘? are 
replaced by >> f;u‘’. If large row deviates are used, the appropriate formula 
is 


(10.2) Den. =7{Da.+(LU + Dav") - (Lu 
+ 2aV7) — --- —- (LU + Davi)}.- 
Thus in Table 6, 


So, = 1050+ 2u1 = 1 
is. 


4 = 315 units. 





11. Interpretation of the Method 


In a sense the detailed method of optimal regions is more than a detailed 
form of the method of optimal regions. For the former, specific rules are given 
for determining the successive increments to the v; . It is essentially a method 
of reduced matrices in which an original matrix is transformed to a reduced 
matrix from which the assignment can be determined from the zero terms. 
The method is especially effective, particularly when using large row deviates, 
in solving non-trivial personnel assignment problems with a small number of 
positions. 
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Although simple structure has proved to be a valuable principle for 
rotation of axes in factor analysis, an oblique factor solution often tends to 
confound the resulting interpretation. A model is presented here which 
transforms the oblique factor solution so as to preserve simple structure and, 
in addition, to provide orthogonal reference axes. Furthermore, this model 
— explicit the hierarchical ordering of factors above the first-order 

omain, 


The purpose of this paper is to present a procedure for transforming an 
oblique factor analysis solution containing a hierarchy of higher-order 
factors into an orthogonal solution which not only preserves the desired 
interpretation characteristics of the oblique solution, but also discloses the 
hierarchical structuring of the variables. 

Oblique simple structure was proposed by Thurstone as a factor model 
useful for psychological research because of the simplicity with which inter- 
pretation could be made from a set of linear components underlying a set of 
scores. His argument is convincing when consideration is given to his ‘‘box 
problem” [9, pp. 140-146] for the factor loadings readily identify the dimen- 
sions of the boxes. In many studies, correlations among the reference axes 
make interpretation of simple structure difficult or questionable. In such 
cases usual methods of transformation from oblique to orthogonal axes fail 
to clarify the nature of the underlying parameters because many of the 
vanishing factor loadings become non-vanishing, thereby destroying simple 
structure. If one is willing to disavow the principle of parsimony of common 
factors, one may employ the type of factor solution outlined in this paper. 
This solution not only furnishes simple structure on orthogonal reference 
axes, but also provides a more complete rationale of the structuring of psycho- 
logical traits than that given by (i) a conventional oblique solution or, for 

*Grateful acknowledgment is given to Dr. Lloyd G. Humphreys for his encourage- 
ment and valuable suggestions in the development of this task. This investigation was 
carried out under the Air Force Personnel and Training Research Center program in 


support of Project Nos. 7702 and 7950. Permission is granted for reproduction, translation, 
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that matter, (ii) a solution in which the number of common factors is equal 
to the rank of the reduced correlation matrix. 

It seems reasonable to assume that psychological behavior may be 
conceived as functioning at different levels of complexity. That is, a complex 
behavior activity might be thought of as an assembly of progressively less 
complex levels of activity—each level may have semantic, psychological, or 
practical meaning. For example, Vernon [11, pp. 22-24] reports that the 
mental structure of a group of British Army and Navy recruits was examined 
with a battery of cognitive tests. As determined from the sign pattern of 
centroid factor loadings, one general factor was found to be present in all 
tests. This factor was designated as g. With the elimination of g, the battery 
could be fractionated into two main groups of tests: academic and practical. 
In turn, the academic factor could be broken into verbal, numerical, and 
educational factors; the practical factor could be broken into mechanical, 
spatial, and physical factors. This structuring of the tests into a hierarchy 
of factors has many recommendable features—it provides information about 
the classification of tests and the. behaviors measured by them in varying 
orders of concurrence and dependence. Had this particular centroid solution 
been rotated to an oblique solution, the hierarchical ordering would have 
been lost or rendered uncertain. 

Structuring of tests into a hierarchical pattern is not a new consideration. 
Holzinger’s bi-factor solution is a special case in which one second-order 
factor overlays the first-order group factors. Burt [1, 2, 3] has strongly 
advocated the hierarchical model for many years. His group factor method, 
which yields this hierarchy, proceeds by successive grouping of variables 
according to their sign pattern in a centroid solution. The procedure set 
forth in this paper, however, is an elaboration of the procedure demonstrated 
by Thompson [8, pp. 297-302] and Thurstone [9, pp. 411-439]. It differs 
from Burt’s not in the product but in the process. The hierarchical solution 
is shown to be a consequence of successively obtained higher-order factor 
solutions. A necessary condition is the existence of simple structure at each 
level. If oblique simple structure exists, it can be recast into a hierarchical 
pattern similar in kind to that which Vernon inferred from the centroid 
solution. It will be seen that the characteristics of simple structure are 
retained not only at the level of the first-order factors but also at all levels. 


Mathematical Rationale 


The mathematical rationale for the model outlined in the paper is 
derived from Tucker’s [10] generalization of the fundamental factor theorem 
stated by Thurstone [9, p. 78]. This theorem states that a correlation matrix, 
R, may be decomposed into correlated common factors and unique factors. 


(1) R = PoP’ + U?, 
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where P represents the coordinates of the vector representation of the vari- 
ables on oblique Cartesian reference axes or factors, ¢ represents the inter- 
correlations among the oblique reference axes, and U represents the unique 
factor coefficients. It is readily seen that if ¢ is the identity matrix, the 
fundamental factor theorem of Thurstone results. 

A second theorem used in this development also stems from Tucker’s 
article. He shows that if the intercorrelations among the factors, ¢, can be 
decomposed as 


(2) ¢ = HH’, 


then the oblique factors, P, may be transformed into orthogonal factors, F, 
according to the operation 


(3) PH = F. 


That is, the coordinates of the variables represented as vectors may be 
transformed from oblique to orthogonal reference axes. Each row of H 
represents the direction cosines of the oblique axes with respect to the ortho- 
gonal axes developed by the decomposition. 

Guttman [4] demonstrates that if a matrix of intercorrelations, ¢, is 
factored as in (2) no matter how H is built up, the reference axes are ortho- 
gonal. The factoring or decomposition may involve any of a variety of 
procedures, such as the diagonal or square root method of factoring, the 
centroid procedure, or the method of principal axes. 

The development of the hierarchical model utilizes these propositions. 
In the following discussion, P; will refer to the primary factor pattern of 
the ith order variables or factors; that is, the coordinates of the vector repre- 
sentation of the variables on the ith order oblique reference axes. R; will be 
used to designate the intercorrelations among the 7th primary factor reference 
axes. U, will represent the unique 7th order variables or factors. 

At the outset, the initial correlation matrix, R, is decomposed according 
to (1) as follows: 


(4) R = P,R,P{ + U;. 
In like manner, 2, is decomposed 

(5) R, = P,R,P} + U2. 
In turn, R, is decomposed 

(6) R, = P;R;P} + U3. 


Each higher-level matrix of intercorrelations among primary factors is 
decomposed in this fashion until 2; becomes the identity matrix, which 
implies that the 7th order primary factors are orthogonal. That is, 


(7) R;-1 = P,P! + U; . 
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In many cases, 2; becomes a unit scalar and P, , therefore, is merely a column 
matrix. Elementary matrix manipulation permits (7) to be rewritten as a 
product of a supermatrix and its transpose: 
(8) Ree => [P; ; U,):[P; “4 U.)’. 
Designating the supermatrix, [P; : U;,], by B; , according to (3), the (¢ — 1)th 
order primary factors, P;_, , can be made orthogonal by the operation 
P: 8, . 

However 
(9) R,-2 a P;-,R;-:Pt-1 + Ui . 
Therefore, it follows that R;_, may be rewritten as a product of a new super- 
matrix and its transpose: 
(10) R;-2 = [P,-.B; : U;-1]-[P:-1B; : U;-1)’. 
This new supermatrix may be designated as B,_, . By virtue of (2) and 
Guttman’s demonstration [4], orthogonal reference axes are obtained. Fur- 
thermore, B;_, serves to rotate the primary pattern, P;_, , to this orthogonal 
reference framework. Continuing this process to the lowest-order level, the 
initial primary or first-order factors, P, , are orthogonalized by the operation 
P,B, . Designate P,B, as B instead of B, since one is usually not concerned 
with explicitly appending the diagonal matrix of unique factors to the common 
factor solution. B, then, is the hierarchica! solution. Since 


(11) R (with communalities) = BB’, 


B represents coordinates of the test variables on orthogonal axes. 

In the development of a hierarchical solution, careful attention should 
be paid to the distinction between simple structure and primary pattern. 
This distinction has been clearly drawn and illustrated by Harris and Knoell 
[5]. The hierarchical] solution is contingent upon the development of a primary 
pattern at each level. This primary pattern, however, may be obtained from 
the simple structure, which is computed either graphically or analytically. 
Once simple structure is identified, it may easily be converted to primary 
pattern [5] by the operation 
(12) P; = VAR;)i, 
where P; is primary pattern, V, is simple structure, and (R;')} is the matrix 
of the reciprocals of the direction cosines between each primary axis and its 
own simple structure reference axis. (R;')} is obtained by taking the square 
roots of the diagonal elements only of R;'. 














Procedure 





To demonstrate the procedure for rotating an oblique simple structure 
into a hierarchical factor solution, a correlation model was constructed from 
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TABLE 1 


Correlation Matrix, R” 








1 2 


3 


5 


6 7 


8 


3 10 





SES .mrauewne 


6400 7200 3136 2688 0983 
8100 3528 3024 
4900 


1106 
0753 
0645 
64,00 


1290 
1452 
0988 
0847 
1344 
0672 
4900 


2903 
3266 


1905 
O5h4, 


8100 





*Communalities appear in the principal diagonal. Decimal points have been 


omitted. 
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Second-Order Primary Factors, P, 








I II III 
1 28 
2 eo? 
3 oh 
4 26 
5 9 
6 23 








TABLE 2 TABLE 3 
Primary Pattern, P, Intercorrelations of Primary Factors, R 
: ee © ee © Of Vv I II III IV V VI 
i 6 I 1.0000 25600 01536 22304 4032 01344 
2 II 25600 1.0000 01344 22016 23528 01176 
3 7 III 01536 01344 1.0000 24,00 01512 20504, 
4 26 IV 02304 22016 22400 1.0000 02268 20756 
5 8 V 04032 «3528 01512 22268 1.0000 22700 
6 Ah VI ol BLL 01176 20504 20756 e2700 1.0000 
7 7 
8 o2 
4 9 
10 5 
ln 
2 
TABLE 4 TABLE 5 


Correlations Among Second-Order Primary Factors, ‘R, 





III 


I ym 





I 1.0000 «4800 


eos 


III 


25600 
4800 1.0000 24,200 
25600 24200 1.0000 
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a postulated simple structure factor matrix. It should be emphasized, how- 
ever, that any set of empirical variables which can be rotated to simple 
structure can also be put in this more interpretable and meaningful hierarchical 
form. That is, if simple structure exists by any definition for a set of variables, 
the procedure is applicable. The given correlation matrix is presented in 
Table 1. An oblique solution was developed by the multiple-group method [6]. 
This oblique solution consists of a primary pattern, P, , and intercorrelations 
among the primary factors, 2, . These two matrices are presented in Tables 
2 and 3. An oblique solution could have been produced by rotation from a 
centroid solution or by some analytic method instead of the multiple-group 
procedure. However, the method of arriving at the oblique solution is of 
little consequence for our purposes, and the grouping procedure was thought 
to be the most expeditious here. If oblique simple structure, V, , had been 
produced, P, could be obtained quite readily by the operation indicated in 
(12). Regardless of methodology, the final rotated oblique solution should 
be transformed into a primary pattern, P, , as defined by Holzinger and 
Harman [7, chap. XI]. 

The intercorrelations of the primary factors, R, , (with communalities 
determined and placed in the diagonal elements) are then factored by any 
method. Usually it is most expeditious to carry out a common-factor analysis 
at each stage to separate the common-factor space from the unique-factor 
space. Rotation of these second-order factors is then performed to obtain 
the primary pattern of the second-order factors, P, , (Table 4) and the 
intercorrelations of the second-order primary factors, R, , (Table 5). A check 
may be made at this point since R, (with communalities) = P,R,P; . Again 
this P, may be developed by the construction of an oblique simple structure, 
V. , which is then transformed into P, by the operation indicated in (12). 

Since the second-order factors, P, , are correlated, it is obvious that a 
third-order factor exists. Consequently, R, is factored. Factoring shows that 
there is one third-order factor and three unique factors, B, (see Table 6). 
The progressive factoring of higher orders is now complete. This information 
is used for developing the preferred hierarchical factor solution. To do this, 
the operation P,B, is performed (Table 7) and the matrix of unique factors 
of R, , Uz , is appended as shown in Table 8. That is, B, = [P,B,: U,]. It 
should be noted that B.B; = R, (with unities in the diagonal of R,). This 
matrix, B, , is used as the transformation matrix for rotating the first-order 
oblique solution’, P, , into the final hierarchical solution, B (Table 9), according 
to the operation 


(13) B= PB, . 
This procedure may be extended to higher orders if correlations are found 


among fourth-order or higher-order factors. 
It will be observed that this hierarchical solution contains 10 common 
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TABLE 6 TABLE 7 
Third-Order Common and Unique Factors, B, Orthogonalized Second-Order Common Factors, PB, 
P U I II III Iv 
pe I 2 
I II peu Iv : -_ = 
3 «21,00 23200 
1 ~—-.8000 : 6000 L 3600 4,800 
2 26000 : 28000 5 6300 26427 
; am 8 sirens 6 .2100 22142 
TABLE 8 
Orthogonalized Second-Order Common and Unique Factors, B, 
PB, U, 
7 II III Iv Vv VI VII VIII x x 
1 .6400 . .4800 : 6000 
2 5600 .4200 : 
3 221,00 3200 $ 09165 
4 3600 24800 8 
5 46300 26427 3 4,359 
6 «2100 02142 3 09539 
TABLE 9 
Hierarchical Factor Solution, B 
z II III Iv v VI VII VIII rx x 
1 .5120 .3840 24800 
2 5760 4320 25400 
3 «3920 2940 04999 
4 «3360 2520 04285 
5 .1920 22560 07332 
6 .0960 21280 03666 
7 «2520 «3360 25600 
8 .0720 20960 21600 
9 5670 o578L, 23923 
10 .3150 2321, 22180 
ll .1260 01285 05723 
12.1470 214,99 26677 





factors, where all tests define factor I. Factors II, III, and IV are the next 
most complex factors. Each of these in turn can be broken down into the 
finer composites illustrated by factors V through X. These last six factors 
identify the six factors of the original oblique solution, P, . It will be observed 
that this solution reproduces the communalities and the off-diagonal cor- 
relations of the original correlation matrix exactly. Furthermore, it furnishes 
the same factorial interpretation as is found in the oblique solution, P, , 
which is the usual type of solution obtained by researchers. Ease of psycho- 
logical interpretation has not been sacrificed by the use of the hierarchical 
solution, and what was concealed in the intercorrelations of the oblique 
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axes now takes on added meaning in terms of the progressive groupings of 
the variables at higher levels. 

It should be emphasized that even though the oblique solution, P, , 
contains variables of complexity one only, this is not a restriction. Variables 
of any complexity may be used. 


Discussion 


A question arises about the stability of the hierarchical solution upon 
modification of the battery of tests. Burt concludes [3, p. 70] that the hier- 
archical solution—designated by him as the group-factor solution—remains 
“stable, if not absolutely invariant, even when the battery of tests or traits 
is modified, e.g., when a comparatively small battery is enlarged by the 
addition of more tests or more groups of tests, or when a large battery is 
curtailed by the omission of tests.’”’ The introduction of a new group of tests 
which are unrelated to any group already in the battery would, of course, 
add a new group factor. 

In all probability, selection, univariate and multivariate, and sampling 
variation would affect this model in the same manner as the simple structure 
model. These points concerning battery modification, selection, and sampling 
stability need further research for clarification. 

Practical applications of this model will be greatly aided as more objective 
and analytical criteria and techniques for transformation to simple structure 
are achieved. Nevertheless, even with present methods of attaining simple 
structure, the hierarchical solution is useful. 


Summary of Steps as Applied to Illustration 


1. R, with communalities, was factored into P,; and R, (Tables 1, 2, 
and 3), that is 
R (with communaiities) = P,R,P{. 


2. R, , with communalities, was factored into P, and R, (Tables 4 and 5), 
that is 
R, (with communalities) = P,R,P3, 


R, (with unities) = P,R,P} + UZ, 


where U, represents the diagonal matrix of unique factors of R, . 
3. R,, with communalities, was factored into P; . (Table 6). One common 
factor was found, i.e. R, was a unit scalar. 


R, (with communalities) = P,P, 
R, (with unities) = P,P; + Uj. 


4. When only one common factor remains, as in this illustration, factoring 
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of the higher-order matrices is completed. Otherwise, the procedure would be 
continued until R; becomes an identity matrix or a single highest-order 
factor is found. At this stage, these intermediate matrices are used for con- 
structing a rotation matrix for transforming the primary pattern, P, , into 
a hierarchical solution, B. 

5. Form matrix B, by appending the unique-factor loadings of R, to 
P, , that is 

B, = [P;: U3]. (Table 6). 

It follows that 


R, (with communalities) = P;P3 , 


R, (with unities) = B,B . 
6. Carry out the matrix operation P,B, . (Table 7). 
7. Form matrix B, by appending the unique-factor loadings of R, to 


P.B, , that is 
B, = [P.B, : U,]. (Table 8). 


It follows that 
R, (with communalities) = P,B,B3P3 , 


BB: . 


R, (with unities) 
8. The hierarchical solution, B, then is constructed by the operation 


B = P,B, . (Table 9). 
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A THEORY OF PATTERN ANALYSIS FOR THE PREDICTION 
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A .u of pattern analysis is presented for the case of dichotomous 
items ai. . quantitative criterion. This “‘configural scale” has maximum 
validity in the least squares sense. A technique for computing the configural 
scale as a polynomial function of the item scores is given. Tests of significance 
are outlined for such questions as: Is there a linear or non-linear relation 
between the quantitative criterion and the item scores? Does the addition 
of certain items to the test increase the validity of the configural scale? Are 
all the items in the configural scale fully effective? 


1. Introduction 


The purpose of this paper is to derive an optimal method of pattern 
analysis for the prediction of a quantitative criterion. Suppose each individual 
has taken a test of ¢ items, and each individual has a criterion score on a 
quantitative variable. What is the best possible way of predicting the criterion 
score from the individual’s answer pattern? How can we make use of all 
the information given by the responses to the ¢ items? 

A least squares method will be proposed as an adequate solution for the 
case of a quantitative criterion. This method is satisfactory if the objective 
is to minimize the sum of squared errors when predicting from the subject’s 
answer pattern to his criterion score. 

Guttman [2] and Rao [8] have already noted that when the criterion is 
qualitative, the maximum likelihood solution will produce the minimum 
number of misclassifications. Rao has given a general proof of this, which 
holds whether the predictors are quantitative or qualitative. The least 
squares solution presented in this paper is equivalent to using maximum 
likelihood when the distribution of criterion scores within the answer pattern 
is normal. 

Meehl [7] called attention to a special case where two dichotomous 
items which correlated zero with the dichotomous criterion would give a 
validity of unity when the item patterns were scored by what Meehl termed 
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the configural method. Meehl’s configural scoring method can be derived 
from Rao’s maximum likelihood approach. Horst [4] has shown that Meehl’s 
configural scoring corresponds to a polynomial function of the dichotomous 
item scores. 


2. Definition of the Configural Scale 


Given a test of ¢ dichotomous items, there will be °° possible answer 
patterns. If the mean criterion score is calculated for eac. answer pattern, 
there will be 2‘ possible criterion means. Every individual in an answer 
pattern is assigned the same score, the mean criterion score for that answer 
pattern. This set of scores is called a configural scale (by analogy with Meehl’s 
scoring method). It will be shown that an individual’s configural score is, 
in the least squares sense, the best prediction of his criterion score. 


3. Theorems on the Configural Scale 


THEOREM 1. The zero-order correlation of the configural scale with the 
criterion is equal to or greater than the correlation of the criterion with any other 
set of scores based on the answers to the t dichotomous items. 


In other words, the configural scale based on criterion means has maxi- 
mum validity. The proof of this theorem follows from the least squares 
property of the mean. All individuals in an answer pattern have exactly 
the same item scores and cannot be distinguished from each other on the 
basis of the test. So they must all be assigned the same predicted criterion 
score. What score, when assigned to all individuals who have the same answer 
pattern, will produce the smallest sum of squared deviations from the observed 
criterion scores? This score is, of course, the answer-pattern mean. 

The general Pearson correlation coefficient can be defined as 


(1) r= V1-—(W/T), 


where 7’ equals the sum of squared deviations about the general criterion 
mean, and W equals the sum of squared deviations of the observed from the 
predicted criterion scores. Since W is a minimum when the configural scale 
is used, r must be a maximum. 

This general least squares technique of predicting quantitative scores 
from qualitative attributes has been known for some time. Guttman [2] 
gave essentially the same method in his section on “The prediction of a 
quantitative variate from a set of attributes.’”” He pointed out that the 
correlation in (1) is actually 7, the correlation ratio. It is also equal to the 
product-moment correlation of the configural scores with the observed 
criterion scores. 

The coefficient in (1) can be defined in analysis of variance terms. Let 
B equal the deviance (sum of squares) between the answer pattern means; 
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let W be the deviance (sum of squares) within answer patterns. Then 

(2) r= VB/(B+ W); 

this formula is equivalent to (1). In this way configural scale analysis can 
be translated into R. A. Fisher’s terminology of between groups and within 


groups. 
To summarize, a configural scale has been defined as the set of 2‘ criterion 


averages, one for each answer pattern, and has been shown to possess maxi- 
mum validity in the least squares sense. 


THEOREM 2. The configural scale can be represented as a polynomial 
function of the item scores: 


c = Do + bX, + bX +s >} bX, + biaX 1X + bisXiX3 + a 
+ biogX1X2X3 + oor eee 


As an example of Theorem 2, take the two-item configural scale set 
forth in Table 1 where C; is the criterion mean of the 7th answer pattern. 


(3) 


TABLE 1 
A Two-item Configural Scale 














Items 
Answer Criterion 
Pattern Mt 2 Average 
1 yes yes C 
2 yes no ay 
3 no yes C; 
4 no no Cy 





The polynomial predictor is 
(4) C = by + b:X1 + bX. + b12XiXe , 


where X, is the score on item 1, 
X, is the score on item 2, 
C is the predicted criterion score, and 
by , b; , b2 , and by. are the best fitting regression coefficients. 


In this paper it is arbitrarily assumed that a No response is scored 
zero and a Yes response is scored unity. So, for the Yes-Yes answer pattern: 
X, = 1, X, = 1, and therefore X,X, = 1. It follows that C, = bb + b, + 
b, + b,. . For the Yes-No answer pattern: X, = 1, X, = 0, and therefore 
X,X, = 0. It follows that €, = bo + b, . Ina similar way, equations which 
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involve only the unknown b’s and the predicted C’s can be derived for each 
of the answer patterns. 

Note that X,X, turns out to be a dichotomous score of either zero or 
unity. In general, all possible multiplicative combinations of the item scores 
will be either zero or unity. 

There are four unknown coefficients in (4); there are 2? or four means 
in Table 1. Therefore, an exact solution for the unknown coefficients, such 
that C = C, is always possible. The four equations are 


(5) Ci =b +b +b + de, 
(6) C2 = bo +h. , 

(7) C: = bb + be, 

(8) Cy = bo. 


The solution to this set of equations for the two-item case is 


(9) b= C., 

(10) b = C.-C, 

(11) b= C,-C,, 

(12) be =C.+0%,—-C,-C,. 


In a similar way, if there were three items there would be 2* = 8 unknown 
parameters in the polynomial prediction equation 


C= b, + bX, + dX, + 6%, + 6X2, 


(13) 
+ bisX 1X3 + bosX 2X3 + DiogX 1X 2X3 ° 


Since there are 8 criterion means, this would again lead to the exact solution 
of a set of 8 equations. 

To prove Theorem 2, it is only necessary to show that there will always 
be at most 2‘ unknowns in the polynomial prediction formula. The proof is 
as follows: there will always be (¢ + 1) unknowns, since the first term in the 
polynomial is the unknown constant bo and each item score will have an 
unknown coefficient b, , b, , bs , --: , 6, . The square, or higher power, of 
any item score reduces to the item score itself (0° = 0, 1° = 1), so all powers 
of X are simply X, ie., X* = X. Therefore, powers of item scores need not 
be considered. There will be ¢(¢ — 1)/2 cross-product terms of the type 
X,X, , X,X; , ete., since this is the number of times two objects can be 
selected from a set of ¢ objects. There will be ¢(t — 1)(¢ — 2)/3! cross-products 
of the type X,X,X; , X,X.X, , X.X;X,4 , ete. Since there is one unknown 
constant for each cross-product term, the total number of unknown coefficients 
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for all terms is 





t#—1) , &t — 1)(t — 2) . t! ~ 
Se eS Rae 

CoROLLARY to THEOREM 2. Whenever the number of empty answer patterns 
for t items is g, then the number of terms in the polynomial equation will be 
2° — g. 

Applying this corollary, if the number of filled answer patterns is (¢ + 1) 
or less, and the ¢ items are linearly independent, then the linear multiple 
regression on the ¢ items will give maximum validity. Of course there may be 
cases where the number of filled answer patterns is more than (¢ + 1) and 
the linear regression still gives maximum validity. This case will be dis- 
cussed in Section 4 in relation to tests of significance. 

For example, consider the perfect Guttman scale, which has both prop- 
erties—there are only (¢ + 1) answer patterns, and the ¢ items are linearly 
independent. In Table 1 if the No-Yes answer pattern were empty, the two 
items would form a perfect Guttman scale. The polynomial predictor would 
become the linear equation 


(15) C = by + bX, + bX: , 
where 

(16) b= C., 

(17) b=C.—-G, 
and 

(18) b =C, —C,. 


These solutions for the b’s make C; = Cc, for the 7th answer pattern. 

The Guttman scale score is defined as the sum of the item scores, S = 
X, + X, . In other words, Guttman sets b) = 0, b; = 6, = 1, in (15). Gutt- 
man [3, p. 89] states that for a perfect scale, this sum score is sufficient for 
maximum validity. A multiple regression on the item scores is not necessary 
because the scale score contains all the necessary information: “The pre- 
dictability of any outside variable from the scale scores is the same as the 
predictability from the multivariate distribution with the attributes. The 
zero-order correlation with the scale score is equivalent to the multiple cor- 
relation with the universe. Hence, scale scores provide an invariant quantifi- 
cation of the attributes for predicting any outside variable whatsoever.” 

This statement is correct only if the phrase ‘‘correlation ratio” is sub- 
stituted for ‘‘zero-order correlation.’’ There are many cases where the zero- 
order correlation of the scale score with the criterion is less than the multiple 
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correlation of the criterion with the item scores. In order to demonstrate 
this, examine the perfect Guttman scale shown in Table 2. 


TABLE 2 
A Perfect Guttman Scale for Two Items 














Items 
Answer £ Criterion 
Pattern 1 2 Average 
1 yes yes C; 
2 yes no Ce 
3 no no C; 





The best fitting multiple linear regression is 


(19) C = bb + bX, + bX, , 
where 

(20) b = C3, 

(21) iL C,~ €,,, 

(22) b= C,-—C,. 


In order for the scale score to have a validity equal to the multiple 
correlation, b, must equal b, . This imposes the restriction that 


(23) Eo, a, =~ E,. 

This is a stringent requirement that cannot always be met. For example, 
if C, is less than C, and is also less than C, , this makes b, positive and b, 
negative, and the scale score will tend to have near-zero validity. 


The Guttman scale score is a special case of the ordinary total score, 
where the item scores are simply added together. In (3) for the general 


configural scale, if b, = b, = b, = --- = b, = 1, and all other coefficients 
equal zero then 
(24) (fee fe a oo ao oo oe 


Similarly, the multiple regression scale is the case where only the linear 
portion of (3) is used, i.e., 


(25) C = by + OX. + WX. + ++ +O... 
Obviously, the validity of these scales can be ranked as follows: 


configural > multiple regression > total score. 
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It should be emphasized, of course, that this statement holds only for 
the sample being analyzed. Because of the loss of the degrees of freedom for 
the configural scale, this relation may not be found when the scale derived 
in one sample is applied to another sample. 


4. Analysis of Variance Tests of Significance 


As has been mentioned previously, the least squares configural scale is 
a maximum likelihood solution when the distribution of criterion scores 
within each answer pattern is normal. If further, the 2‘ criterion score vari- 
ances are homogeneous, then the analysis of variance technique can be used 
for tests of significance. The polynomial function used in this paper can be 
shown to be an exact algebraic transformation of Fisher’s analysis of variance 
mathematical model for the case of equal answer pattern frequencies. This 
is true whenever the systematic portion of the analysis of variance model 
is equal to the cell mean. 

These tests can be used to answer such questions as: Is the validity of 
the configural scale significantly greater than zero? Is the validity of the 
total score significantly greater than that of the configural scale? Will the 
linear multiple regression give maximum validity, or are non-linear terms 
necessary? If m items are added to the test, will the configural scale validity 
increase? Are there certain terms in the polynomial predictor which do not 
contribute significantly to the validity? All these questions and other similar 
ones can be answered by the general F-ratio test. 

For example, suppose the question arises, ‘‘Is the validity of the configural 
scale greater than zero?’’ The exact F-ratio test is 


(26)* ils (= Eo -\(¥ 7 =) i (57 (MS 5) 


with (2 — 1) over (N — 2°) degrees of freedom. 
Some definitions are needed to make the terms in (26) clear from a 
computational point of view. Let 








t be the number of dichotomous items, 
N be the total number of individuals 
n; be the number of subjects in the 7th answer pattern, 
C;; be the criterion score for the jth individual in the ith answer pattern, 
C;. be the average criterion score for the 7th answer pattern, i.e., the con- 
_ figural score for the 7th answer pattern, 
C.. be the average criterion score for all N individuals. 


*In formula (26) and elsewhere it is assumed that all 2 answer patterns are filled. 
If g answer patterns are empty, then the degrees of freedom for W equals (NV — 2¢ + g) 
and the degrees of freedom for B equals (2 — 1 — g). 
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Then B is the between sum of squares, i.e., 


(27) B= Sn. -7.y. 
W is the within or residual sum of squares, i.e., 

(28) W = > . (Ci; — C4, 
and 

(29) fr = B/T = *. 


Similarly one can find out if S, the total score obtained by just adding 
up the unweighted item scores, is sufficient to produce maximum validity. 
Let r, be the correlation of S with the criterion; then 


- n — nN _ 2) 
oe) F = (3=4)\U=2 
with (2‘ — 2) over (N — 2‘) degrees of freedom. 

If the configural scale validity, 7, is significantly greater than that of 
the total score, the next question which can be raised is whether the relation 
between the item responses and the criterion is linear or non-linear. It is 
suspected that with most current mental tests, there is a linear relation. This 
question can be answered by seeing if n’, the squared configural scale validity, 
is significantly larger than R*, the squared multiple correlation based on 
the ¢ items. 








2 2 t 

,_(7—R \ - 2) 

” ri 1-9 
with (2° — ¢ — 1) over (N — 2‘) degrees of freedom. 


Another question that can be answered is as follows: suppose m items 
are added to a k-item test. Is the validity of the (k + m)-item configural 
scale greater than that of the k-item configura] scale? The F-ratio significance 


test is 


7. (We = Hie \(X 72) ” (2 Y 2) 
= ( a OE i Rs 6k Ee I 


where W, and r; refer to statistics calculated on the basis of the k-item 
configural scale, W,,,, and r;,,, refer to statistics calculated on the basis of 
the (k + m)-item configural scale, and the degrees of freedom are (2°°" — 2") 
over (NV — 2**”), 

In general, it is possible to test the significance of any particular subset 
of terms in the polynomial predictor. Let H, refer to any (null) hypothesis 
which restricts some of the parameters of the polynomial predictors on an 
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a priori basis. Let H, refer to the non-null hypothesis which places no restric- 
tions on the 2‘ parameters. Then the general formula for F is 


Wo — Wi\f 
e9) r= NH). 
where vo = N minus the number of parameters used in predicting the 
criterion scores according to the Hy , 
W, = the deviance (sum of squared errors of prediction) obtained by 
applying Ho , 
, =N-2, 
W, = the deviance obtained by applying the polynomial predictor, and 
the degrees of freedom are (v) — v,) over (N — 2). 


Another way of writing it is 





» ms) v1 ) 
sata ioe (2 — o/\1 — n/’ 
where 

»_ T-W; 

a T ’ 

os. 1 = FW, 

1 Dols dhs 5] 


and T' is the deviance about the general mean. 

Equations (33) and (34) give the general solution for testing what are 
known as “linear hypotheses,’ [5, pp. 298-302]. This allows the reader to 
construct his own test of significance for any question about the polynomial 
predictors. 

5. Discussion 


Many ingenious methods of pattern and profile analysis are being used 
today in an attempt to increase the predictability of the criterion. Gaier and 
Lee [1] in a partial review of the literature, summarized some 28 references. 
Presumably, one could take a set of data and compare all of the known 
methods to see which has the greatest validity. This would be a laborious 
and inefficient way of solving the problem. As Horst has said [4, p. 8], “The 
work in this area will oe much more fruitful when more precise and rigorous 
mathematical concepts are developed to take the place of verbal formulations 
and analyses based on empirical or trial and error manipulations of the data.” 

Given the case of ¢ dichotomous items and a quantitative criterion, 
the least squares approach shows that the configural scale, a ‘th degree 
polynomial function of the ¢ item scores, possesses maximum validity. To 
the extent that any of the present techniques of pattern analysis can reach 
this maximum validity, they are special cases of the polynomial function. 
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It is always possible to write a mathematical description of the configural 
scale and apply the usual matrix algebra theorems to the result. For example, 
the polynomial predictor in matrix form is 


(35) C = Xb, 


where C is the 2‘ by one column vector of observed criterion averages, 
X is a 2‘ by 2‘ matrix of zero-one entries, where the rows represent 
answer patterns and the columns represent terms of the polynomial, 
b is the 2° by one column of coefficients. 
If all the 2‘ answer patterns are filled, and X is non-singular. then 


(36) b = (X7)C 


is an exact solution. If some of the answer patterns are empty, X can still 
be made into a square non-singular matrix by eliminating the corresponding 
polynomial terms and the least squares solution is still as above. 

The principal advantage of calculating b is that the relative importance 
of each term in the polynomial predictor is specified numerically. For example, 
suppose the criterion is not a function of the second-order interactions of 
the items. Then, when b is calculated, all coefficients of the type b,; will be 
near zero. In such a case, when it has been hypothesized that only k of the 
regression coefficients are non-zero, a least squares solution can be obtained 
which uses only the k specified terms. 

It is convenient to compute first an approximate least squares solution. 
If this solution gives an adequate fit, then only the k specified terms are 
needed. If the approximate solution does not give an adequate fit, then it is 
necessary to compute the exact least squares solution. 

The approximate least squares solution is as follows: Let X, be the 
2' by k matrix obtained by selecting the specified k columns from X. Then 


(37) b, = (XLX,)*XiC. 


The exact least squares solution is as follows: Let Z, be an N by k matrix 
whose general element, x;,, , is the score of the jth individual on the mth 
term of the polynomial. Then, given that the jth individual is in the 7th 
answer pattern, the jth row of Z, is exactly equal to the ith row of X;, . 
Essentially Z, is an expanded form of X, where each of the rows of X, has 
been repeated n; times. Let C be an N-rowed column vector where C; is the 
criterion score of the jth individual. Then 


(38) w, = (ZZ) ZC 


is the set of regression coefficients which give the exact least squares fit. 
Equation (38) provides a test on whether any specified set of item 

interactions is related to the criterion. It can be a powerful tool for testing 

psychological hypotheses about the relation between subject’s responses to 
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the items and his criterion score. This could be the most useful function of 
configural analysis. 

Another possibility is to use the configural technique for item analysis. 
This would involve an empirical search pattern with or without the use of 
hypotheses. However, considerable caution is needed in such empirical 
applications of configural scoring since the degrees of freedom used for 
computing the regression coefficients increase exponentially with the number 
of selected items. In general, any procedure involving configural scoring 
implies a very small number of items and a large number of subjects. Also, 
unless the items have been specially constructed, or there are good theoretical 
grounds for believing that non-linear relations exist, the usual total score 
will probably give maximum validity. 

It is especially necessary to be careful in generalizing from the analysis 
sample to future samples. Because of loss of degrees of freedom, there will 
be ‘a sizeable decrease in the cross-validity. 

One procedure for guarding against errors in generalization to future 
samples would be as follows: (a) Compute back-validity (validity on the 
analysis sample) and test whether it differs from zero. (b) If the back-validity 
is significantly greater than zero, test to see if it is significantly greater than 
the multiple correlation and total score back-validities. (c) If the above tests 
are positive, the cross-validities for each model should be estimated by 
Lord’s method [6] to see if the configural scale has any practical advantage. 
(d) As a final safeguard, the actual cross-validities can be computed and 
tested for significance. If the estimated cross-validity differences fall to zero, 
then it is unnecessary to analyze the cross-validation sample. ; 

The authors are grateful to Professor Charles F. Wrigley and Dr. Maurice 
Lorr for advice and critical comment; and to Mrs. Ruth Heitman for her 
typing services. 
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THE EXPECTED VARIANCE OF THE SAMPLING ERRORS FOR 
A SET OF ITEM-CRITERION CORRELATIONS 


Husert E. BroapEN 
PERSONNEL RESEARCH BRANCH 
THE ADJUTANT GENERAL’S OFFICE* 


An expression for the expected variance of the sampling errors for the 
validities of a set of correlated items that is computationally feasible when 
the number of items is large is developed. Since the item difficulties are 
assumed to be constant, the estimate must be applied to pools or sub-pools 
of items reasonably homogeneous with respect to difficulty. 


In item analysis and in a number of related problems, an expression 
giving the variance of the sampling errors of a set of item validities would 
often be useful. The standard error of the item validity coefficients is not a 
satisfactory estimate since it is well known that the sampling errors in the 
validities of a set of correlated variables are themselves correlated. Wishart [1] 
has presented a general though complex solution giving the sampling dis- 
tribution of the covariance matrix for a set of correlated variables. His 
solution is not, however, feasible and may not be appropriate when applied 
to the problems arising in dealing with the sampling variation of a set of 
item validities. This note will propose a solution to the problem of estimating 
the variance, but not the distribution, of the sampling errors for a set of 
item validities that appears to be both simple and feasible. 

At least one application of such an estimate is obvious. Many investiga- 
tors, upon finding that 5 per cent of a set of items are valid at the 5 per cent 
level of confidence, conclude that their item analysis data are of no value. 
It is hoped that the sampling estimate to be presented will permit a more 
accurate conclusion in problems of this nature. 


Definition of Symbols 


r; =11,, , the point-biserial (or product-moment) correlation, in 
the sample, between any of a set of items and’ an external 
criterion. The items (2, , 2, +--+ ,%;,°** , %) take on values 
of 1 for the correct choice and 0 for the incorrect choice. The 
correct alternative must be determined, though arbitrarily, 
before computing the item validities. 

*The opinions expresssd are those of the author and do not reflect official Department 
of the Army policy. 
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7,,, , the point-biserial correlation between any item and the 
criterion in the universe. 
Oo; =0:2;; VP Q; , the standard deviation of an item score in a sample. 
the standard deviation of an item score in the universe. 
>>; x; , the sum of the item scores—a score obtained by sum- 
ming the number of correct (as defined above) alternatives. 

N = the number of individuals in the sample. 
E(o%, ;-;,)) = the expected variance of the errors in a set of item validities 
for a sample. 


Re 
Il 





I 


Assumptions 


1. It is assumed that all items have equal difficulty and, as a corollary, 
that o,; is constant across items. 

2. It is assumed that oc; and ¢, are satisfactory estimates of, respectively, 
the ¢; and ¢, . This assumption is similar to, but less restrictive than, the 
usual assumption in similar developments that the predictors remain fixed 


in going from the sample to the universe. 
3. It is assumed that ¢ and the criterion are normally distributed. 


The Derivation 
The problem is to determine, in a form feasible for calculation, an 
expression for the expected variance of the errors for a set of item validities 
in a sample. Since the errors are the discrepancies between the universe and 
sample values, the expected value of the variance cf the errors in the validities 
of a set of items has the basic definition 


(1) E(o'.:-#0)) = E{(1/n) ® i= 7)’ — [/n) : ¢; — 7) )°} 
E{(1/n) ¥ (r; — *,)°] — E[(1/n) Ee — (1/n) DFP. 


(2) 


From a well-known formula, the correlation of the sum of the items (¢) with 
the criterion may be written, if 7; is assumed to be constant, 


(3) Tey a ag a r,;/o; ? 


and, consequently, 


(4) ri(o./o.) = 1%. 


By similar reasoning 


(5) 7.(6,/6:) = > F;. 
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Substituting in (2) 


E(o(..-#) = Ef{(i/n) ie (r; — #)"] 


(6) 
a E[(1/n)r.,(o./0;) = (1/n)F.,(6,/6;)). 


From assumption 2, ¢, equals o, and ¢; equals a; . Reducing further, 


(7) E(o%.-#49) = (1/n) > Et, — 7)” oe (1/n®)(o7/o) EC, a >. 


The expected value of the square of (r; — 7;) or of (r,, — 7,,) is the average 
of their squared values over an infinite series of samples and is, consequently, 
also equal to their sampling variance. Formulas are available to evaluate 
them, and o, can be obtained from the sample. Thus, (7) provides a solution 
to the original problem, although this solution may be somewhat tedious 
since the sampling variance for each item validity must be determined. 
In most item analysis problems, the r; should be sufficiently close to zero 
so that 1/N may be regarded as a satisfactory estimate of their sampling 
variance. In this event, (7) will reduce to 


(8) E(o(,.-7) = (1/N)[1 — (1 — #2,)(01/n’o%)]. 


(8) gives a feasible solution to the original problem. 

(8) will reduce further if the average item intercorrelation is zero. If 
this is true, the average of the off-diagonal entries of the full symmetric 
matrix of item covariances (whose sum equals o;) will also equal zero, and 
o; will reduce to no; . If n is large, 


(9) E(o%, ;-#:)) =e 1/N 


(9) is, of course, the expected result if full statistical independence of the 
items is assumed. The derivation just presented shows that a less restrictive 
assumption permits the use of this simple formula. In practice, if of does 
not exceed no; , 1/N may be used in place of (8) to give an estimate of the 
variance of the sampling errors across a set of items. It should be stressed 
that the less restrictive assumption applies in estimating the variance of the 
sampling errors; nothing has been demonstrated regarding the distribution 
of sampling errors. 

While the writer had primary interest in the expected variance of the 
sampling errors of the validities of a set of dichotomous items, an adaptation 
of (7) will apply to the validities of a set of continuous predictors. The assump- 
tion of equal item difficulty is, of course, unnecessary. It must be assumed 
that the standard deviations of the individual predictors and the standard 
deviation of the sum of the predictors are the same in the sample and uni- 
verse and that ¢ is redefined as the sum of the predictors with each predictor 
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converted to unit standard deviation form. With these assumptions, 


(10) E(o?,,-22) = (/n) >> EG; — %)? — o(1 — 77,) nN. 


Discussion 


To apply the formula given in (8), ¢ must be determined for each case 
in the sample, and ¢, must be computed. The scoring run to compute ¢ is 
the greater part of the labor involved. By the definition given to ¢, the sign 
of an item in such a scoring run must correspond to the sign used when the 
item validities were initially computed. 

The formula behaves as would be expected in those special cases where 
the solution is obvious. If the intercorrelations of the items are all plus one, 
the variance of the errors of a set of item validities should be zero and the 
solution by the formula yields zero variance of the errors. If the items are 
independent of each other, the errors should be independent of each other, 
as shown in (9), and the formula should and does simplify to the sampling 
variance of a correlation coefficient. 

Since the p-values of the items were assumed to be constant in deriving 
the formula, a pool of items involving a considerable range of difficulty 
will have to be subdivided into pools of constant item difficulty before the 
formula is applied. In the author’s opinion, a range of difficulty of at least 
.10 can be permitted without introducing serious error, since the variation 
of o; is quite small within such a p-value range. 
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A NECESSARY AND SUFFICIENT FORMULA FOR MATRIC 
FACTORING 


Louis GuTTMAN 


CENTER FOR ADVANCED STUDY IN THE BEHAVIORAL SCIENCES* 


For the purpose of extracting factors from matrices, it is proved 
that a certain f rmula is both necessary and sufficient. In factor analysis, 
the formula may be applied either to the correlation matrix, or directly to 
the score matrix (assuming the communality problem is solved). As many 
factors as desired can be extracted in one operation. Having such a compact 
formulation is useful for teaching as well as computing ‘——«— since it 
includes all techniques of factor extraction as special cases 


Let A be an arbitrary (real) matrix of order p X q and rank r. It is 
desired to extract factors from A by finding a matrix A, of rank s, where 
s <r, of the form 


(1) A, = BDC, 


where D is non-singular and of order s, such that A, shall be of rank r — s, 
where 


(2) A= A — Aj. 


This requires further that B and C be of rank s and of orders p X s and 
s X q, respectively. 

Such a problem occurs in factor analysis in at least two different but 
closely related ways: 


(a) A may be the observed score matrix after unique-factor 
scores are subtracted out, for q individuals on p tests. 
Then B (or BD) can be regarded as common-factor loadings 
of the tests, and DC (or C) as common-factor scores of 
the respondents. 
(b) A may be the observed correlation matrix with communal- 
ities in the main diagonal. In this case, p = gq; C = B’; 
and A, A, , A, , and D are restricted to being Gramian. 
Then B again gives common-factor loadings of the tests, 
while now D is the inverse of the covariance matrix of 
the common factors, being a diagonal matrix when the 
common factors are orthogonal. 
*On leave from the Israel Institute for Applied Social Research. This research was 
facilitated in part by a grant from the Lucius N. Littauer Foundation to the American 
Committee for Social Research in Israel, Inc. 
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In either case, when s = 1, a single factor is extracted from A by (2), 
as by the centroid method, principal axis, or other ways of reducing r by 1. 
When s > 1, several factors are extracted simultaneously, as in multiple- 
group methods. When s = r, or A, = 0, all factors are extracted in one step 
(cf. [1, 2]). The relationship between the factoring of scores and the factering 
of correlation coefficients has been analyzed in [1, 2]. 

It has been shown in [1] that a sufficient formula for A, is as follows. 
Let X and Y be arbitrary weight matrices of orders s X p ands X q, respec- 
tively, and such that XAY’ is non-singular. Let 


(3) D = (XAY’)”. 
Thus, D is of rank and order s. Compute B and C by the formulas 
(4) B= AY’, C= XA. 


Then if A, is computed by formula (1), A, must be of rank s and A, in (2) 
of rank r — s. For factoring the correlation matrix as in case (b) above, let 
Y = X. 

This sufficient technique for extracting factors is actually only a generali- 
zation of Lagrange’s technique for reducing bilinear forms, as pointed out 
in [1]. 

It is of considerable interest to inquire* as to whether any other kind 
of formula is possible for A, in (1), keeping D non-singular, but removing 
conditions (3) and (4). An important restriction in (4) is that the factor 
matrices B and C are linear transformations of A. Is it possible for factors 
to exist that are not such functions of A? 

The answer turns out to be in the negative. If A, is of rank s and reduces 
the rank of A tor — s, then A, in the form (1) must always have B, C, and D 
of the forms (4) and (3). Our formulas are necessary as well as sufficient. 

For the proof, suppose A, is of the form (1) and is of rank s, and D is 
non-singular of order s. Thus, B and C are of orders p X s and s X q, respec- 
tively. Define the partitioned matrix EF to be 


(5) a ad | 
CD 


E is A enlarged by s rows and s columns. By direct multiplication it is verified 
that 


6) sie Pe B i. | 
0 D'j)LDe I, 


where J, and J, are the unit matrices of order p and s, respectively, and A, 
is the residual matrix defined by (2). 


*This problem was suggested to the writer by Dr. W. A. Gibson. 
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Let ¢ be the rank of A, . Since the first matrix on the right of (6) is clearly 
non-singular, and the rank of the second matrix is clearly s + ¢ (the sum of 
the ranks of A, and J,), the rank of HZ must be s + ¢. Therefore, a necessary 
and sufficient condition that ¢ = r — s is that the rank of E equal r. But in 
the right of (5), A by itself is already of rank r, so a necessary and sufficient 
condition that E be of rank r is that the last submatric row in the right of 
(5) be linearly dependent on the first submatric row, or that there exist an 
X such that 


(7) C=XA, D™' = XB. 


Similarly, the last submatric column must be linearly dependent on the 
first submatric column, or there exists a Y such that 


(8) B= AY’, D"'=CY’. 


Note that X and Y need not be uniquely determined when p > rand q > r, 
respectively. The first parts of (7) and (8) yield (4); substituting the first part 
of (7) in the last part of (8) yields (38). 

Thus, simultaneously both the necessity and sufficiency of the factoring 
formulas, in place of only the sufficiency proof in [1], have been proved. 
All possible factoring methods, whether directly on the score matrix or on 
the correlation matrix, can differ only in the choice of weight matrices X 
and Y. This fact not only gives a unified and simplified approach to practical 
computing procedures (cf. [2, 4]), but also—as Lubin has pointed out—serves 
as a simple basis for teaching factor analysis to beginning students [5]. 

It must be cautioned, however, that the above formulas assume the 
communality problem solved. The gravity of this assumption is analyzed 
in [3]. 
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EXACT PROBABILITIES FOR CONTINGENCY TABLES USING 
BINOMIAL COEFFICIENTS 


JAMES M. SAKODA 
AND 
Burton H. CoHEen 


UNIVERSITY OF CONNECTICUT 


The use of binomial coefficients in place of factorials to shorten the 
calculation of exact probabilities for 2 X 2 and 2 X r contingency tables is 
discussed. A useful set of inequalities for estimating the cumulative prob- 
abilities in the tail of the distribution from the probability of a single table 
is given. A table of binomial coefficients with four significant places and n 
through 60 is provided. 


A 2 X 2 contingency table and a numerical example are represented as 
follows: 


a b a+b ii 8 15 
c d c+d 8 37 45 
at+e b+d N 15 45 60 


Under the hypothesis of independence the exact probability, p, of specified 
values of a, b, c, d given the marginal totals (a + b), (ec +d), (a+c), (b+ d) 


can be written either in terms of factorials or binomial coefficients [1, 2]: 


4) _—@tWE+AD!A+tOMb+A! — aisle crac 
ia a! b! c! d! N! a 


Using binomial coefficients for our example, 


_ 6435-2156 - 10° 
5319-10°° 
Use of binomial coefficients in the calculation of cumulative probabilities, 
P, for a given table and those more extreme than it permits the possibility 
of cumulating cross products on a desk calculator as follows: 


. ose exaCe 
wose 








= .02608. 


(2) P= 





For the numerical example 


isCz 45C's = isCs asC7 > isCo 4sCe + aa: 

60C 1s 
_ 6435-2156: 10° + 6435-4538-10* + 5005-8145-10° 
a 5319-10°° 





Da 





.03234. 
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This is the probability for a one-tailed test based on three terms only. No 
more than four or five terms are generally necessary to obtain a fairly accurate 


probability. It can be 


Binomial Coefficients, .C, 


shown that in the critical region of the distribution the 


























ror nel nz=2 n = 3 nz n2z5 n =6 n=7 n=8 n29 n = 10 
n-F 

| 1 2 ~ 6 7 8 9 10 
2 1 3 6 10 15 21 28 36 45 

3 4 10 20 35 56 84 120 
4 1 5 15 35 70 126 210 
5 1 6 21 56 126 252 
ror/n=il 1822122 n#i13 n#l4 n2#15 n2#®16 n2#17 n#18 p#l19 n#20 
o-r 

1 11 12 1 14 15 16 17 18 19 20 
2 55 66 7 91 105 120 136 153 171 190 
3 165 220 286 364 455 560 680 816 969 1140 
4 330 495 715 1001 1365 1820 2380 goee 3876 4B45 
5 462 792 1287 2002 3003 4368 6188 568 1163-1 1550-1 
6 462 924 1716 3003 005 8008 1238-1 1856-1 2713-1 3876-1 
7 330 792 1716 3432 6435 1144-1 1945-1 3182-1 5039-1 7752-1 
8 165 495 1287 3003 6435 1287-1 2431-1 4376-1 7558-1 1260-2 
9 55 220 715 2002 5005 1144-1 2431-1 4862-1 9238-1 1680-2 
10 11 66 86 1001 3003 8008 1945-1 4376-1 9238-1 1848-2 
r or n= 21 n = 22 n = 23 bp = 24 n = 25 pn = 26 n = 27 bp = 28 n *= 29 n = 30 
o-rTr 

1 21 22 23 24 25 26 27 28 29 30 
2 210 231 253 276 300 325 351 378 406 435 
3 1330 1540 1771 2024 2300 2600 2925 3276 3654 4060 
4 5985 7315 8855 1063-1 1265-1 1495-1 1755-1 048-1 2375-1 2741-1 
5 2035-1 2633-1 3365-1 4250-1 5313-1 6578-1 8073-1 9828-1 1188-2 1425-2 
6 5426-1 7461-1 1009-2 1346-2 1771-2 2302-2 2960-2 3767-2 4750-2 5938-2 
-& 1163-2 1705-2 2452-2 3461-2 4807-2 6578-2 8880-2 1184-3 1561-3 2036-3 
8 2035-2 3198-2 4903-2 7355-2 1082-3 1562-3 2220-3 3108-3 4292-3 5853-3 
9 2939-2 4974-2 8172-2 1308-3 2043-3 3125-3 4687-3 6907-3 1002-4 1431-4 
10 3527-2 6466-2 1144-3 1961-3 3269-3 5312-3 8436-3 1312-4 2003-4 3005-4 
11 3527-2 7054-2 1352-3 2496 -3 4457-3 7726-3 1304-4 2147-4 3460-4 5463-4 
12 2939-2 6466-2 1352-3 2704-3 5200-3 58-3 1738-4 3042-4 5190-4 8649-4 
13 2035-2 4974-2 1144-3 2496-3 5200-3 1040-4 2006-4 3744-4 6786-4 1198-5 
14 1163-2 3198-2 8172-2 1961-3 4457-3 9658-3 2006-4 4012-4 7756-4 1454-5 
15 5426-1 1705-2 4903-2 1308-3 3269-3 7726-3 1738-4 3744-4 7756-4 1551-5 
ror|/ n=31 n= 32 of 33 n= 34% on #35 n=36 n?37 ono #38 n=39 nn = 40 
n-Fr 

1 1 32 33 34 35 36 37 38 39 40 
2 wes 496 28 561 595 630 666 03 741 780 
3 4495 4960 5456 5984 6545 7140 7770 8436 9139 9880 
4 3147-1 3596-1 4092-1 4638-1 5236-1 5890-1 6605-1 7382-1 8225-1 9139-1 
5 1699-2 2014-2 2373-2 2783-2 3246-2 3770-2 4359-2 5019-2 5758-2 6580-2 
6 7363-2 9062-2 1108-3 1345-3 1623-3 1948-3 2325-3 2761-3 3263-3 3838-3 
7 2630-3 3366-3 4272-3 5380-3 6725-3 8348-3 1030-4 1262-4 1538-4 1864-4 
8 7889-3 1052-4 1388-4 1816-4 2354-4 3026-4 3861-4 4890-4 6152-4 7690-4 
9 2016-4 2805-4 3857-4 5245-4 7061-4 9414-4 1244-5 1630-5 2119-5 e332 
10 | 4435-4 6451-4 9256-4 1311-5 1836-5 2542-5 3483-5 4727-5 6357-5 = 8477-5 
11 8467-4 1290-5 1935-5 2861-5 4172-5 6008-5 8550-5 1203-6 1676-6 2312-6 
12 1411-5 2258-5 3548-5 5484-5 8345-5 1252-6 1852-6 2707-6 ae 5587-6 
13 2063-5 3474-5 5732-5 9280-5 1476-6 2311-6 3562-6 5415-6 122-6 1203-7 
14 2652-5 4714-5 8188-5 1392-6 2320-6 3796-6 6107-6 9670-6 1508-7 2321-7 
15 3005-5 5657-5 1037-6 1856-6 3248-6 5568-6 9364-6 1547-7 2514-7 4023-7 
16 3005-5 6011-5 1167-6 2204 -6 4060-6 7308-6 1288-7 2224-7 3771-7 6285-7 
17 2652-5 5657-5 1167-6 2334-6 4538-6 8597-6 1591-7 2878-7 5102-7 8873-7 
18 2063-5 4714-5 = 1037-6 §=« 2204-6 )3=—_ 4538-6 )§=— 9075-6 3=6 1767-7 «= 3358-7 §9=: 6236-7 =—s-:« 1134-8 
19 1411-5 3474-5 = 8188-5 )— «1856-6 §=64060-6 §=— 8597-6 §=61767-7 3535-7 6892-7 1313-8 
20 8467-8 2258-5 5732-5 1392-6 3248-6 7308-6 1591-7 3358-7 6892-7 1378-8 
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error in omitting terms will be smaller than the probability of the last table 
which is utilized and equal to or larger than the probability of the first table 
which is not utilized. If the use of another term produces negligible change 


in the P \y 


value, no further calculation is necessary. In the example, the 


Binomial Coefficients, C, 








5286-7 7838-7 1150-8 1669-8 2399-8 3416-8 4823-8 6752-8 9378-8 
9867-7 1515-8 2299-8 3449-8 5117-8 7516-8 1093-9 1576-9 2251-9 


1665-8 2652-8 4167-8 6466-8 9915-8 1503-9 2255-9 3348-9 4924-9 

2547-8 4212-8 6864-8 1193-9 1750-9 2741-9 4244-9 6499-9 9847-9 

3537-8 6084-8 1030-9 1716-9 2819-9 4569-9 7310-9 1155-10 1805-10 
4468-8 8005-8 1409-9 2438-9 4154-9 6973-9 1154-10 1885-10 3041-10 
5138-8 9606-8 1761-9 3170-9 5608-9 9762-9 1674-10 2828-10 4713-10 
5383-8 1052-9 2013-9 3774-9 6944-9 1255-10 2231-10 3905-10 6733-10 
2338-8 1052-9 2104-9 4117-9 7850-9 1483-10 2739-10 4970-10 8875-10 
4468-8 9606-8 2013-9 4117-9 233-9 1612-10 3096-10 5834-10 1080-11 
3537-8 8005-8 1761-9 3774-9 7890-9 1612-10 3225-10 6321-10 1215-11 
2547-8 6084-8 1409-9 3170-9 6944-9 1483-10 3096-10 6321-10 1264-11 





n=52 n=53 n=54 n2=55 n=56 0257 n=58 0 = 59 nb = 60 





ror} n = 41 
n-?r 

1 41 

2 820 

3 1066-1 

ay 1013-2 
5 7494-2 
6 4496-3 

7 22he-4 

8 9555-4 
9 3503-5 
10 1121-6 
11 3159-6 
12 7899-6 
13 1762-7 
14 3524-7 
15 6343-7 
16 1031-8 
Ne 1516-8 
18 2021-8 
19 2447-8 
20 2691-8 
21 =| 2691-8 
22 2447-8 
23 2021-8 
24 1516-8 
25 1031-8 
r or n = 51 
n-PF 

ps 52 

2 1275 

3 2083-1 

” 2499-2 

5 2349-3 
6 1801-4 

7 1158-5 

8 6368-5 

9 | 3042-6 
10 1278-7 
11 4763-7 
12 1588-8 
13 | 4763-€ 
14 = | 1293-9 
15 | 3189-9 
16 7175-9 
17 1477-10 
18 2790-10 
19 4846-10 
20 7754-10 
21 1145-11 
22 15€1-11 
23 1966-11 
24 229€-11 
25 2480-11 
26 2480-11 
e7 2296-11 
26 1968-11 
29 1561-11 
30 1145-11 





1769-9 2404-9 245-9 4354-9 5805-9 7695-9 1014-10 1330-10 1735-10 
4481-9 6250-9 54-9 1190-10 1625-10 2206-10 2975-10 3990-10 5319-10 


1036-10 1484-10 2109-10 2975-10 4165-10 5790-10 7996-10 1097-11 1496-11 
2195-10 3231-10 4715-10 6825-10 9800-10 1396-11 1975-11 2775-11 3872-11 
4267-10 6462-10 9693-10 1441-11 2123-11 3103-11 4500-11 6475-11 9250-11 
7636-10 1190-11 1836-11 2806-11 4247-11 6370-11 9473-11 1397-12 2045-12 
1260-11 2024-11 3214-11 5050-11 7856-11 1210-12 1847-12 2795-12 4192-12 


1920-11 3180-11 5203-11 8417-11 1347-12 2132-12 3343-12 5190-12 7984-12 
2705-11 4625-11 7805-11 1301-12 2143-12 3489-12 5622-12 8964-12 1415-13 
3529-11 6234-11 1086-12 1866-12 3167-12 5310-12 8799-12 1442-13 2339-13 
4264-11 7793-11 1403-12 2489-12 4355-12 7522-12 1283-13 2163-13 3605-13 
4776-11 9039-11 1683-12 3086-12 5574-12 9929-12 1745-13 3028-13 5192-13 


4959-11 9735-11 1877-12 3561-12 6646-12 1222-13 2215-13 Fae 6989-13 
4776-11 9735-11 de 3824-12 7385-12 1403-13 2625-13 4840-13 8800-13 
4264-11 9039-11 1877-12 3824-12 7649-12 1503-13 2907-13 5532-13 1037-14 
3529-11 7793-11 1683-12 3561-12 7385-12 1503-13 3007-13 5913-13 1144-14 
2705-11 6234-11 1403-12 3086-12 6646-12 1403-13 2907-13 5913-13 1183-14 
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probability of the next term in the series is .00007, which changes P to .03241 
and makes the error less than .00007. 
It can also be shown that the following inequalities* exist: 


ad ad 
o(1 TET DE+ 5) SPs (1 *Gieth-&- te 5); 


where p and P are defined in (1) and (2), and ad is taken to be the smaller 
and be the larger of the two products. For the example, 








lA 


- ( “2 
02608( 1 + 304 er .02608| 1 + 255 


.03157 < P < .03263. 


In the accompanying table, binomial coefficients, ,C, , are given to 
four significant digits. Values were calculated to six significant digits, rounded 
off to four, and then checked against a recent source [5]. A table of logarithms 
of binomial coefficients, available in Hald [4] to m = 100, can be substituted 
for a table of binomial coefficients. 

In the case of r X 2 tables, the probability of a specified table given the 
column sums m and n, and row sums (a + b), (€ + d), (e+ f), --- , is 


(a + b)! (c + d)! (e + f)! : m! n! sea. esse sie on sCe 


alb! cid! elf! WN! = a 








Here also the use of binomial coefficients is economical. The procedure is 
still laborious, however, since it is necessary to lay out all of the possible 
tables to find those which are equally or less probable than the one in ques- 
tion [3]. 


REFERENCES 
[2] Fisher, R. A. Statistical methods for research workers. New York: Hafner, 1954. 
[3] Freeman, G. H. and Halton, J. H. Note of an exact treatment of contingency, goodness 
of fit, and other problems of significance. Biometrika, 1951, 38, 141-149. 
[4] Hald, A. Statistical tables and formulas. New York: Wiley, 1952. 
{5} Miller, J. C. P., editor. Tables of binomial coefficients. Royal Society mathematical 
tables Vol. 3. Cambridge: University Press, 1954. 
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to a similar set of inequalities, which we modified slightly. 
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A STOCHASTIC MODEL FOR ROTE SERIAL LEARNING 
Ricuarp C. ATKINSON* 


INDIANA UNIVERSITY T 


A model for the acquisition of responses in an anticipatory rote serial 
learning situation is presented. The model is developed in detail for the case 
of a long intertrial interval and employed to fit data where the list length 
is varied from 8 to 18 words. Application of the model to the case of a short 
intertrial interval is considered; some predictions are derived and checked 
against experimental data. 


This paper represents a preliminary attempt at quantitative theorizing 
in the area of rote serial learning. The model is applicable to experimental 
situations employing the anticipation method [6] and deals with the acquisi- 
tion of correct responses, anticipatory responses, perseverative responses, 
and failures-to-respond. In addition, direct applicability of the model is 
limited to situations restricted as follows: (a) moderate presentation rate, 
(b) dissimilar intralist words, (c) familiar and easily pronounced words. The 
explanation for these restrictions is considered later. 


Model 


The model makes use of the conceptual formulation of the stimulating 
situation introduced by Estes [3] and elaborated by Estes and Burke [4]. 
The general assumptions are: (a) the effect of a stimulating situation upon 
an organism is made up of many component events; (b) when a situation is 
repeated over a series of trials, any one of these component stimulating 
events may occur on some trials and fail to occur on others. Rather than 
review the rationale of these assumptions, the reader is referred to the Estes- 
Burke paper which is helpful to an understanding of the present work. 

Figure 1 schematically presents the rote serial learning situation. 
The successive word exposures in a list of r + 1 words are indicated by 
W,, W.,---: , W,, W,.: where W, is the cue for S’s first anticipation on 
each run through the list. Rj represents a hypothesized covert response 
associated with the ¢ + 1st word presentation; the response of “reading”’ 
W;,,; . On the other hand, R,(z) is the response recorded by the experimenter 
to the ith word presentation and can be either (a) a correct anticipation 

*The author wishes to thank Professors C. J. Burke and W. K. Estes for advice 


and assistance in carrying out this research. 
tNow at Stanford University. 
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of the 7 + Ist word when j = 2, (b) an incorrect anticipation when j + 2, 
or (c) a failure-to-respond when the 7 subscript is omitted. (Symbols and 
their meanings are listed in Appendix B.) 
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Schematic representation of the anticipatory rote serial learning situation. 


A period h is defined as the time of a single word exposure, and a trial 
refers to one run through the list. Since the removal of one word is followed 
immediately by the presentation of the next, a trial is of time h(r + 1). The 
intertrial interval is represented as a series of k subintervals each of length h; 
thus, the intertrial interval is of time kh. When there are r + 1 words in a 
list, the list length is designated as r; this reflects the fact that the r + 1st 
word is not a cue for an anticipatory response. 

The 7th word presentation is represented conceptually as a set of stimulus 
elements S; where the sets are pairwise disjoint, and hence the intersection 
of the r + 1 sets is the null set. The number of elements in §; is N, where NV 
is invariant over 7, and a parent set S* is defined such that the union of the 
r + 1 sets is a subset of S*. On a given presentation of the ith word a sample 
of elements from §; is effective; the likelihood of any element from 8; being 
in the sample is 6; where 0 < 6; < 1. (Derivations presented in this paper 
are carried out under the simplifying assumption that all elements in §; 
are equally likely to occur on any trial.) Therefore, given the ith word pres- 
entation, a sample is drawn from §, of size N@; . 

Conditional relations, or connections, between response classes and 
stimulus elements are defined as in other papers on statistical learning 
theory. The response classes R, , R, , --- , R, , and R (failure-to-respond) 
define a partition of S* into subsets S%, , 8%, , --- , S$ . Elements in S%, 
are said to be conditioned to the response class R, etc. The concept of a 
partition implies that every element of S* must be conditioned to either 
R,, Re, °**, Or R, but that no element may be conditioned to more than one. 
For each element in 8; a quantity F(7; 7; n) is defined which represents the 
probability that an element from set 8; is conditioned to response class R; 
at the start of trial n. At times this notation is unnecessarily detailed; the 
abbreviation C(z; n) is introduced to designate the probability that an 
element from §, is conditioned at the start of trial n to a correct anticipatory 
response. 
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The anticipatory response at position 7 on trial n is assumed to be a 
function of the stimulus elements sampled from §, on that trial. Specifically, 
the probability of R;(z) is the ratio of the number of sampled elements 
from 8, conditioned to the response class R; to the number of elements 
sampled from §; . Since 6; is constant for all elements in §; , the probability 
of R,(z) on trial n is the expected value of F(z; 7; n). 

For each element sampled from §, on trial n it is postulated that there is: 
(a) a probability \ that the element is returned to S* during the h-interval 
immediately following the .one in which it was sampled; (b) a probability 
(1 — A) that it it is returned to S* during the second h-interval following 
the one in which it was sampled; (c) a probability \(1 — \)* that it is returned 
to S* during the third h-interval following the one in which it was sampled; 
and so on. The probability that an element will be eventually returned to 
S* is unity since 


(1) > x ai wd, 


The phrase “‘available at position 2’ is used to refer to an element sampled 
from some set and not yet returned to S* during the h-interval in which 
W, is presented. The notion of an element being available at a position 
other than the one at which it was sampled is one way of formalizing the 
concept of trace stimuli. Parenthetically, note that the probability of an 
anticipatory response at position 7 is defined in terms of the stimulus elements 
sampled from §; and is not affected by elements which are available at 
position 7 but sampled from a stimulus set other than §; . 

The conditioned status of elements sampled from §,; upon their return 
to S* depends on the anticipatory response made at position 7. If a sample 
is drawn from §,; which elicits a correct anticipatory response, R,(z), then 
all elements in the sample become conditioned to the response class R; and, 
independent of the time that an element is available, are returned to S* 
conditioned to that response class. On the other hand, if the sample elicits 
a response, other than a correct one, all elements in the sample revert to 
being conditioned to the response class R, and there is a specified probability 
that the elements will be conditioned to the R} responses which occur before 
they are returned to S*. That is, given an incorrect anticipation or a failure- 
to-respond, all sampled elements become conditioned to the response class 
R and then: (a) a proportion 6 of the sampled elements are conditioned to 
the response class R; when R} occurs, and (1 — 8) remain unchanged; (b) A of 
the elements are then returned to S* and (1 — X) remain available during 
the next h-interval where, again, 8 of the remaining elements are conditioned 
to the response class R,;,, when R{,, occurs, and (1 — 8) remain as they 
were in the previous interval; (c) \(1 — A) are now returned to S* and 
(1 — \)’ are carried on where @ are connected to the response class R;,2 
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when R/,, occurs and (1 — 8) remain as they were in the previous interval; 
and so on. 

Finally, it is assumed that nothing which occurs during the intertrial 
interval will change the conditional status of the elements not yet returned 
to S* at the beginning of this interval. That is, elements returned during 
h-intervals of the intertrial interval have the same conditional status as 
elements returned in the last h-interval of the list presentation. 

More generally stated, if a sample of elements elicits a response which is 
confirmed as correct (reinforced), then each element in the sample becomes 
conditioned to that response and will remain conditioned unless the element 
is sampled at some later trial, and this new sample elicits an incorrect response. 
If a sample leads to an incorrect response, then the elements in the sample 
revert to being conditioned to the response class R and have a probability 
B of being conditioned to the response class R; associated with the Rj re- 
sponses which occur before the element is returned to S*. The conditioning 
proportion 8 can be interpreted as the probable occurrence of the implicit 
response R{ to the 7 + Ist word presentation. This interpretation does not 
affect the quantitative formulation of the model. 

The present analysis of serial responding requires a modification of the 
notion of a sampling constant introduced in other papers on statistical 
learning theory. 6; is postulated to be a function of the number and order 
of the words that have preceded the 7th word. Once again, consider intervals 
of time h. If the word exposure has been preceded by an infinite number of 
h-intervals which do not contain word exposures, then the sampling constant 
is 6, ; if, on the other hand, the word exposure has been preceded by an 
infinite number of h-intervals each of which contained a word exposure, 
the sampling constant is 6... Let c = 6, — 0. , where c > 0 and, necessarily, 
c < 1. Further, designate a decay constant n such that 0 < n < 1. Ifa 
series of successive word exposures occur, and are preceded by an infinite 
number of h-intervals which do not contain word exposures, then (a) the 
sampling constant associated with the second word exposure is 0, — cn; 
(b) the sampling constant associated with the third word is.6, — c[yn + 
n(1 — n)]; (c) the sampling constant for the fourth word is 6, — c[n + 
n(1 — ») + n(1 — 7)]; and so on. Thus, if the intertrial interval is infinite 
(i.e., each run through the list is preceded by an infinite number of h-intervals 
which do not contain word exposures), the sampling constant associated 
with set 8; on any run through the list is 


(2) 6; = 0 —c[l — (1 — »)*"]. 


An inspection of this equation indicates that 6, defined over list positions, 
has a maximum at position one and approaches 0 < 6, — ¢ < 1 as 7 becomes 
large. 

The formulation of the sampling constant requires a uniform activity 
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during intervals which do not contain word exposures; 6, is postulated to 
be a function of the type of activity. 

The equations specified by the above assumptions can now be written. 
Consider the case in which the intertrial interval is “long,’”’ for purposes of 
the model infinite. This case proves to be simpler than that in which the 
intertrial interval is “short”? because in the infinite interval all elements 
sampled from §, on trial n are returned to S* before the beginning of trial 
n + 1 (see equation 1). (Perseverative errors are not possible for the infinite 
intertrial interval, and their consideration is deferred until discussion of the 
short interval case.) 

Given a list length r and an infinite intertrial interval, the expected 
values of the probabilities of correct anticipatory responses on trial n + 1 
to the exposure of W, , W,_, , and W,-_, are 


(3) Cir;n + 1) = (1 — 6,)C(r;n) + 0,{CC; n) + [1 — Cr; n)]B}, 
Cr —1;n+ 1) =(1 — 6,_,)C(r — 1;n) 

+ 0,-.{C(r — 1;) + [1 — Cr — 1; n)]f6 + (1 — ABUL — AJ}, 
Cir — 23n + 1) = (1 — 6,-2)Cr — 2;n) + ,-2{C(r — 2; n) 

+ [1 — Cr — 2; n)][\6 + ACL — A)B(1 — 6) + (1 — A)*B(1 — 8)*]}. 


(4) 


More generally, 
(6) C;n+ 1) = (1 — 0)CG;n) + O{ClG;n) + [1 — CG; n)]8A,}, 
where 


= 2 = i ~ aT 
its ~ i — #) 


Inspection of (7) indicates that A, defined over list positions, is bounded 

between zero and unity. The function assumes a minimum at position one 

and increases as 7 becomes large to a maximum value of unity at position r. 
The solution of difference equation (6) is 


(8) Ci;n) = 1 — [1 — CG; 0))[1 — 0,84,)" 
(cf. [5]). 


Similar sets of equations (see Appendix A) can be written for the prob- 
ability of an anticipatory error and failure-to-respond. However, for simplicity, 
analysis is limited here to C(i; n). 

For the typical rote serial learning situation, assume C(7; 0) = 0; that is, 
on the first run through the list S will make no correct anticipations. The 
probability of an error on trial n at position 7 is [1 — C(z; n)], and the number 


+i —- HO - oT”. 





(7) A; =A 
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of errors at position ¢ during the first « + 1 trials is 


(9) > (1 ead C(i; n)] - 1—-— [1 fet 


n=0 





As x becomes large this expression approaches 
(10) 1/(0;8A,). 
Application to Data 


Data have been collected for different list lengths with a one-minute 
intertrial interval [1]. The lists were composed of familiar and easily pro- 
nounced two-syllable adjectives; no two words possessed similar meaning 
or phonetic construction. The data on total number of errors over the first 
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Theoretical and observed values of mean number of errors by serial positions over the 
first 16 trials for lists of length 8, 13, and 18. 
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on the records of 42 Ss obtained in a situation employing a latin square 
design. Evidence on intertrial interval [1] suggests that the one-minute 
period experimentally approximates the theoretical infinite intertrial interval. 
Therefore equations (2) and (10) are applicable. These equations were 
employed to provide a visual fit to data for the list in which r equals 18; the 
obtained parameter values were \ = .41, 8 = .55, 0, = 1.00, c = .64, and 
n = .35. These values were substituted in equations (2) and (10) to yield 
predicted curves for r equal to 8 and 13. An inspection of Figure 2 indicates 
close agreement between predicted and observed values. 


Discussion 


In the introduction the class of rote serial learning experiments to which 
the model is presumed to apply was delimited. The reasons for these restric- 
tions are: 

(a) Moderate presentation rate. A presentation rate that is too rapid 
would tend to decrease the likelihood of overt verbal responses and lead to 
an increase in the number of failures-to-respond. Consequently the model 
when applied to conditions of rapid presentation would underestimate the 
observed number of failures-to-respond. On the other hand, the model assumes 
that a single sample is drawn from §; during the W, exposure, an assumption 
which is to depend on a short exposure period. Experimentally these diffi- 
culties can be resolved by a short word exposure period followed by a blank 
exposure during which S provides an anticipation or failure-to-respond. An 
extension of the model to the case of a rapid rate has been examined, but 
the equations will not be displayed here. 

(b) Highly dissimilar words. It is required in the model that the §, 
sets be pairwise disjoint. This simplifying assumption is suspect for any 
serial learning situation, but it appears to provide an adequate approximation 
in this restricted situation. For the case of highly similar list words a set of 
elements common to each §,; would be introduced; the additional problems 
generated in this case are not considered here. 

(c) Familiar and easily pronounced words. For the model, this restriction 
refers to a state such that the occurrence of the hypothesized W;,—R?_, 
relation is invariant over trials. For nonsense syllable learning the model 
would require, as an additional feature, a function describing the acquisition 
over trials of the W;—R?_, connection [7]. 

In analyzing the model, the case where the intertrial interval is long 
has been considered. With a short interval the equations become more 
complex. Now some elements sampled on trial n remain available throughout 
the intertrial interval and into the next run through the list. For example, 
assume that an element is sampled from §S,_, on trial m and not returned to 
S* for five h-intervals; the probability of this event is \(1 — \)*0,_, . When 
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k = 1, the element will be returned after the occurrence of R{ on trial n + 1. 
Consequently, there is a probability 6[1 — C(r — 1; n)] that this element is 
conditioned to the response class R, . The element, when sampled again, 
increases the likelihood of an R, anticipatory response which, at position 
r — 1, would be classified as a perseverative error. It follows that the shorter 
the intertrial interval the greater the number of perseverative errors. This 
result has been experimentally verified [1]. 


Appendix A 
Probability of a Fatlure-to-Respond and an Anticipatory Error 


For the case of an infinite intertrial interval the probability of a failure- 
to-respond at position ¢ on trial n + 1 is 


(11) -RGi;n +1) = (1 — 6)RG;n) + 0,[1 — Cli; n)] — B)A; . 
The solution [5, p. 584] of this difference equation is 


(J sen B)A; 


eas (l= 6:84," — (1 - 0)", 


(12) Ri; n) = (1 — 6,)"RG; 0) + 
where R(i; 0) is the probability of a failure-to-respond on the initial run 
through the list. The probability of an anticipatory error is 


(13) A(i;n) = 1 — C(t;n) — R(i;n). 


For the typical experimental situation, assume C(z; 0) = 0 and R(i; 0) = 1; 
then (13) reduces to 


L=—-2 
(14) A(i;n) = ——*« [(1 — 6,64," — (1 — 6)"}. 
(14 A(z; n) BA, [( 6,84,;) (1 6;)"] 
12) and (14) when summed over the first x trials, as was done in (9) 


for incorrect responses, produce functions for failures-to-respond and anticipa- 


tory errors of the form reported by Deese and Kresse [2]. 


Appendic B 
List of Symbols and Their Meanings 


A(z; 7) probability of an anticipatory error at position i on trial n. 
conditioning constant associated with an incorrect anticipation. 


rv] 

c 6, Be) 2 

C(i; x) probability of a correct anticipation at position 7¢ on trial n. 
A, function defined over 7; dependent on r, A, and 8. 

n decay constant related to the decrement in 6; as 7 increases. 


h time of a single word exposure. 
] number of h-intervals in the intertrial interval. 


Ww 
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d probability that an available element will be returned to S* during 
the next h-interval. 

n number of trial. 

r list length. 

R! hypothesized covert response; reading W;., . 

R, response class; overt anticipation of W;., . 

R response class; failure-to-respond. 


R,(z) _R, recorded by experimenter to W, . 
R(t; x) probability of a failure-to-respond at position 7 on trial n. 


S* set of stimulus elements of which all 8; are subsets. 

S, set of stimulus elements associated with W; . 

6; probability of sampling an element from S; when W; occurs. 

W, 7th word presentation, where W, is cue for first anticipation. 
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A COMPUTATIONAL PROCEDURE FOR TAU CORRELATION* 


DESMOND S. CARTWRIGHT 


UNIVERSITY OF CHICAGO 


The tau coefficient is defined, and a computational procedure for tied 
ranks is described. The procedure maintains continuous computational 
checks, saves labor, and particularly facilitates the use of taw with large 
samples. It is also shown how tau correlation may be applied to Q-sorts with 
any shape of forced distribution or with unforced distributions. 


It frequently happens in psychological research that the only appropriate 
test for an hypothesis is one that does not assume normality for the variates 
concerned. Where correlation is at issue, a non-paramatric coefficient is 
required. Spearman’s rho is well known. Kendall’s fau [3], a newer method of 
rank correlation, has several advantages over rho. Most important is the fact 
that the significance of a sample fau can be accurately evaluated on the basis 
of the normal probability integral for n > 10, while for n < 10 exact tables are 
available. The chief disadvantage of tau is the computational labor involved. 
This paper presents a method of computation designed especially for large 
samples and multiple ties. 


Definition 


Among n individuals there are n(n — 1)/2 relations between pairs. If 
the individuals be ranked on two variates, each pair can agree or differ as to 
the order of ranks. Tau is defined as 


Pe eal. 
() "nln — 1/2? 


where n is the number of individuals, P is the number of pairs having the same 
rank order on both variates, and Q is the number of pairs having inverse 
orders. If the two rankings agree perfectly, then P = n(n — 1)/2, Q = 0, 
and r = +1.00. If one ranking is a perfect inversion of the other, then Q = 
n(n — 1)/2, P = 0, andr = —1.00. 


*The procedure described was developed in connection with research at the Counseling 
Center, University of Chicago. The research is supported by a grant (PHS M 903) from 
the National Institute of Mental Health, of the National Institutes of Health, Public 
Health Service. 
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Alternate Formulas 


Kendall has given formulas equivalent to (1) which reduce computation 
labor. For example, 








ea eee 
(2) =" n(n — 1)/2 
The transformation from (1) to (2) is 
(3) BP sr 8!) Bot —— 
n(n — 1)/2 n(n — 1)/2 n(n — 1)/2’ 


which holds when P + Q = n(n — 1)/2. This is true if, and only if, there 
are no ties. Tied pairs can generate neither an agreement nor an inversion. 
Tied pairs can generate only zero scores. Let the sum of these be Z. In the 
tied case n(n — 1)/2 = P + Q + Z. Hence the transformation (3) is im- 
possible. It is for this reason that Bright’s recent procedure for computing 
zr [1] is inappropriate for the tied case. Only formula (1) is appropriate here. 


Procedure 
If only one ranking contains ties, arrange the paired ranks as nearly as 
possible in natural order from left to right on the tied variate. Bracket each 
tied set. An example follows, with R, the tied variate: 


B24. 18 £14.79 2 
eles Hi oie we 





P=54+34+3+14+14+0+40+40=13 
Q=24+14+14+3+4+240+4+0+4+0= 9 
Z=04+2+14+0+4+04+2+4+1+0= 6 
Check > = P+Q4+2Z2=74+6+54+44+34+2+1+4+0= 28 


Certain rules of procedure may be set out as follows: 

(1) There is an agreement, at any given number in R, , for every 
larger number to the right of it which is nof in the same 
bracket. 

(2) There is an inverston, at any given number in R, , for every 
smaller number to the right of it which is not in the same 
bracket. 

(3) There is a zero, at any given bracketed number in R, , for 
every number to the right of it in the same bracket. 


Every given pair of individuals generates either an agreement (larger 
number to the right), an inversion (smaller number to the right), or a zero 
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(in the same bracket). For every ith individual, proceeding from left to right, 
there will be n — 7 to the right, each of which must generate an agreement 
or an inversion or a zero. A check row is affixed, with entries n — 7; for each 
individual, (agreements + inversions + zeros) = n — 7. Also for the totals, 
P+Q4+Z=n(n — 1/2 = Din — 3). 

If there are ties in both rankings, arrange the paired ranks as nearly as 
possible in natural order from left to right on one tied variate R, . Bracket 
each set tied on R, . Add one more rule to the three given above: 


(4) There is a zero, at any given number in R, , for every equal 
number to the right of it which is not in the same bracket. 


R.: 1 2 ° 5 5 5 oe ee 


R,: 4 5 12 6 8 8 8/5 138L11 2 2 J11 11 
moO Ff @37 1310 C Ri Ces 
4=0 020000020010 





P= 9+ 8+ 44-4444-444404042424040=41 
Q= 3+ 3+ 04242424245+0+0+0+0+0=19 
Z=Z,+Z,= OF 0+ 6434+2+1+40+0+4+4+140+1+4+0=18 





Check 50 =P+Q4Z=12+11+10+9+8+7+6+5+4+4+3+2+1+0=78 


For clarity the zero scores arising from rules (3) and (4) are separated 
in tabulation. Zeros arising from (3) are entered in the row Z, . Zeros arising 
from (4) are entered in row Z, . The row Z is given by Z, + Z, . 

Our second example has n = 13. With 78 relations between pairs to be 
examined, computation is already laborious. A short-cut is provided by the 
following method. 

After setting out R, and R, as before, R, is given a separate tabular 
form. This is called a “B-chart’’. The stub-head contains all ranks and 
mid-ranks of R, written in natural order. For any set of ties only one mid-rank 
is represented. The top row of the table proper shows the number of individ- 
uals in each set. Let that number be wu. Columns are then filled in with entries 
decreasing successively by 1 from wu until unity is reached. The complete 
B-chart for our second example appears like this: 


Rank or mid-rank oD 4 5 6 8 11 13 





Number of individuals: 1 


3 
2 
1 


mw 
rm bo ww 
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The B-chart gives an orderly representation of 2, . The sum of numbers 
in the top row is equal to n. Each entry in the top row gives the number of 
individuals having the rank or mid-rank shown in the stub-head. Thus there 
are three individuals with mid-rank of 8. The chart shows at a glance how 
many individuals have a rank number larger than a given number. Thus the 
sum of numbers in the top row to the right of column 8 is equal to 4. It shows 
how many individuals have a rank number smaller than a given number. Thus 
the sum of numbers in the top row to the left of column 8 is equal to 6. 

Computation by the rules listed above proceeds from left to right. 
Before applying the computation rules to any given number, we strike out its 
appropriate entry in the B-chart. Thus, for any given number in R, , the chart 
will show how many numbers /o the right of that number in R, are larger than, 
smaller than, or equal to the given number. Thus rapid computation for rules 
(1), (2) and (4) is possible. Rules for using the B-chart follow: 


(5) For any given non-bracketed number in R, , enter the B-chart 
under the column-head with that number, and strike out 
the topmost visible figure in the column. The sum of topmost 
visible figures (i.e., not struck out) in all columns to the 
right of that entered gives the number of agreements for 
rule (1). The sum of topmost visible figures in all columns to 
the left of that entered gives the number of inversions for 
rule (2). In the column entered, the topmost visible figure 
(after the strike-out) gives the number of zeros for rule (4). 

(6) Before computing by rules (1), (2) and (4) for any given 
bracketed number in R, , enter the B-chart under the column- 
head with that number and strike out the topmost visible 
figure in the column. Repeat for each number in the bracket 
until all appropriate strike-outs are made. Computations 
then proceed as in rule (5). 


Use of rules (5) and (6) will be illustrated with the second example. The 
first number in RF, is 4. By rule (5) the entry 1 in column 4 of the B-chart is 
struck out. Topmost visible figures in columns to the right of 4 are summed, 
giving 9 agreements by rule (1). Summing to the left gives 3 inversions by 
rule (2). There are no zeros by rule (4). The second number in R, is 5. By 
rule (5) the entry 1 in column 5 is struck out. Summing to the right gives 8 
agreements by rule (1). Summing to the left gives 3 inversions by rule (2). 
There are no zeros by rule (4). The third number in R, is 2, and it is bracketed. 
By rule (6) the entry 3 in column 2 of the B-chart is struck out. Other num- 
bers in the same bracket are 6, 8, 8 and 8. The entry 1 in column 6, and the 
entries 3, 2 and 1 in column 8 are all struck out. Just before computing for 
the third number in R, , the B-chart would look like this: 
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Rank or mid-rank lee 4 5 6 8 11 13 





Number of individuals: — - = - ~ 3 1 
2 - 2 
1 - 1 


For the third number in R, , which is 2, summing to the right gives 4 agree- 
ments by rule (1); there are no inversions by rule (2); there are two zeros 
by rule (4), as given by the topmost visible figure in column 2 itself. 

Rule (3) does not require use of the B-chart. In a bracketed set of ¢ 
numbers in R, , the ith number generates ¢ — 7 zeros by rule (3). This number 
of zeros is entered directly in row Z, . 


Formulas for Ties 


The method described above yields P and Q, components of the numerator 
for r. Kendall gives two alternative denominators for use with ties. The first, 
7, , implies an untied criterion ranking. Ties indicate departures from this 
criterion even as inversions do. Hence the denominator remains n(n — 1)/2. 
The second, 7, , does not imply a criterion. The presence of ties simply 
reduces the maximum possible number of agreements. The denominator is 
accordingly reduced and made equal to the geometric mean of P,,,,. for R, 
and P,,,4. for R, . For any set of n untied ranks, P,,,. = n(n — 1)/2, and 
Qin = 0. If there are ties Q,,;, is still 0, but every pair of numbers within a 
tie generates a zero, and thereby reduces P,,,, . Within any tie, the number 
of such zeros is t(¢ — 1)/2. Hence for R, , Pax = n(n — 1)/2 — >>, [t(t — 1)/2], 
where for each tie, ¢ is the number of ranks tied, and >>, means summation 
over all sets of ties in R, . If we label the ties in R, , u, then P,,,, for R, = 
n(n — 1)/2 — >>, [u(u — 1)/2]. Then, 
ie Se 

Vintn — 1)/2] — TV [n(n — 1/2] — VU’ 





(4) Tb 





where 


T= > «t — 1)/2, 
U = > uu — 1)/2. 


For our second example, T = (5-4/2) + (3-2/2) = 138; U = (3-2/2) + 
(3-2/2) + (3-2/2) = 9; and 


41 — 19 


- Ve) V9) 








Tp 
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A pplication of r to Large Samples 


The procedure described in this paper may be used with large samples of 
any kind of appropriate data. However, illustrations will here be confined to 
investigations using subjective metrics. 

Butler and Fiske [2] have recently argued for increased use of subjective 
metrics in personality assessment. They point out [2, p. 332] that a card-sort, 
as used in Stephenson’s Q-technique [4], can be regarded as a ranking with a 
fixed number of ties. They assert that 7 correlation is an appropriate method 
of analysis for such sorts. 

When cards are forced into a normal distribution by the experimental 
instructions it seems immaterial whether the product moment or the +r 
correlation procedure is employed. In some investigations however, it is 
desired to use a forced rectangular distribution, or some other forced shape 
of distribution which may or may not satisfy the requirements for using a 
product moment coefficient. In other investigations maximum reduction of 
experimental constraint may be required, so that the subject is left free to 
distribute the cards in any way he pleases. Under such conditions the r 
coefficient provides an appropriate test for hypotheses concerned with 
correlation. 

In Table 1 a B-chart is presented for the case of forced-normal sorts of 64 
cards. It will be noted that the column heads are pile numbers instead of ranks. 
The subject is required to sort the cards into 7 piles with the given distribution. 
The metric provided by the instruction may be ‘from most to least signifi- 
cant for you.” The basic order relations between cards are therefore given by 
the pile numbers, and it is unnecessary to transform these to ranks. 

Suppose F, and FR, on the 64 cards are set out as for our second example. 
Suppose the first number in R, is 4. Then, following rule (5), there are 
15 + 6 + 1 = 22 agreements by rule (1); there are 15 + 6 + 1 = 22 in- 
versions by rule (2); and there are 19 zeros by rule (4). Instead of taking 
n — 1 = 63 observations of the relations between the first member and all 
other members, only three readings are made from the chart in Table 1. 

Where work is being done with forced-sorts, one blank chart of the kind 
shown in Table 1 can be mimeographed for all correlations on sorts having the 
given distribution. When the investigation is concerned with distribution-free 
sortings, individual charts must be prepared. For extensive work of this kind, 
however, it is possible to set up a generalized chart with rows r = (the maxi- 
mum likely number of cards sorted into any one pile), and columns c = (the 
maximum likely number of piles). Every column is then filled with numbers 
descending from r successively by 1. For a particular correlation the B-chart 
is then drawn in ink. This procedure is exemplified in Table 2. 

The distribution outlined in Table 2 gives the B-chart for a hypothetical 
sorting of 26 colors in terms of preference. Using this chart, and machine 


TABLE 1 


Rechart for a Forced-Normal Sort of 4; Items 
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summation across the rows of P, Q, and Z, the computation of r for two 
unforced sortings of 26 items can be done in less than 10 minutes. The entire 


routine is illustrated in Table 3. 
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ITERATIVE INVERSE FACTOR ANALYSIS—A RAPID METHOD 
FOR CLUSTERING PERSONS 


BERNARD M. Bass 


LOUISIANA STATE UNIVERSITY 


By interchanging persons and items, iterative inverse factor analysis 
provides a relatively inexpensive way of clustering persons according to 
their patterns of response to the items. In addition to permitting the cluster- 
ing of large numbers of persons, the technique enables one to determine 
the bases for such clustering. The items of behavior used can be heterogeneous 
in content and form. 


Wherry and his associates have described iterative procedures for factor- 
analyzing large numbers of test items [1, 2, 4]. They suggest that the iterative 
approach would yield the same factor structure as the traditional multiple- 
centroid procedure. Moreover, the iterative approach would provide the 
factor loadings of each item rather than merely the loadings of each total 
test score. 

The original development involves the following procedures for a test 
of dichotomously scored items: 

1. Obtain a total score X, on all items for each subject. 

2. Obtain the tetrachoric correlation coefficient between each item and 


the total score. 
3. Select those items to form pool 1 which correlate highest with the 


score X,. 

4. Obtain a total score X, using all items less those items in pool 1. 

5. Repeat steps 2 and 3 to obtain pool 2. 

6. Iterate until all communality among items has been accounted for 
by the pool scores. 

7. The pool scores lack independence; but the obtained oblique factor 
matrix can be rotated to simple structure, and, if desired, to an orthogonal 


solution. 


The present article proposes that all the advantages of the iterative 
technique can be applied to inverse factor analysis by interchanging subjects 
and items. Iterative inverse factor analysis provides a means for clustering 
persons according to their response patterns. The behavior assessed can be 
measured in a variety of ways, and both sample size and the scope of behavior 
studied can be increased greatly with relatively little increase in analysis time. 
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Additional advantages may include the following: (1) The “factorial 
composition” of each individual person can be obtained—limited, of course, 
to the variety of behaviors assessed. (2) Persons can be clustered according 
to the pattern of their specific dichotomous responses without any initial 
assumptions about item relations. (3) Large numbers of persons and items 
can be studied. (4) The items can be mixtures of any kind of attributes and 
behaviors. Dichotomously grouped and quantitative data can be used 
simultaneously. (5) Specific items of behavior, the source or basis of ‘‘person 
clusters,’ can be determined. 

Inverse iterative factor analysis proceeds as follows: For N dichotom- 
ously scored items (i.e., accept—reject): 


1. Obtain the frequency with which all subjects responded accept to 
each specific item. 

2. Order all items according to the frequency with which the accept 
response was given. Divide this item distribution into two equal parts, an 
upper and lower half. 

3. Key each item according to whether it is in the upper or lower half 
of the distribution of acceptance. 

4. Determine the frequency (X,) with which a given subject responded 
accept to items in the upper half of the distribution. 

5. Determine the frequency (X) with which the same subject responded 
accept to all N items. 

6. For the given subject, enter X, and X in Table 1 as shown, where: 


(a) X, is the number of times the given subject responded accept to 
the half of the items to which all subjects responded accept most 
frequently. 

(b) X is the frequency with which the given subject responded accept 
to all items. 

(c) N is the total number of items, constant for all subjects. 


7. Obtain the totals N/2 for each row of Table 1, by dividing in half 
the total number of items N. Obtain the number of items to which the given 
subject responded reject N — X by subtraction. Complete the remaining 
cells of the four-fold table by subtraction. 

8. Obtain the tetrachoric correlation between the given subject’s and 
all subjects’ tendencies to respond accept to the same N items, using the 
data of Table 1. 

9. Order all subjects according to their respective tetrachoric correla- 
tions obtained in step 8. Select those with the highest correlations for pool 1 
using an arbitrary cut-off value, for example, the lowest correlation statistic- 
ally significant from zero at the 1 per cent level. 

10. Repeat steps 1 through 9 after eliminating subjects in pool 1 from 
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TABLE 1 
Table for Computing Tetrachoric Correlation Between 


a Given Subject and All Subjects 














A Given Subject's Responses 
"Accept" "Reject" Both 
Upper 50% of Items "Accepted" 
by All Subjects X N/2 - x, n/2 
u 
Lower 50% of Items "Rejected 
by All Subjects X -X N/2 + x, -xX N/2 
u 
All Items xX N- xX N 














the sample. X remains constant for a given subject. Note that N remains 
constant for all subjects during the iterations, while X, varies with each 
successive iteration for each individual. None of the items nor any of the 
responses are ever eliminated from consideration as the iteration proceeds. 

11. Continue iterations until the correlations between each person and 
each pool of persons have been obtained. This is the unrotated oblique 
factor matrix. 

12. Rotate to simple structure. 


Adding subjects to the analysis merely serves to augment the work load 
arithmetically, whereas a geometric increase would be involved if traditional 
approaches were used. Traditionally, the clustering of, say 100 persons would 
require approximately four times the work of clustering 50 persons. Clustering 
1,000 persons would be unmanageable for most experiments, since the work 
would become 400 times as great as for clustering 50 persons. Moreover, 
assuming that IBM cards or scoring sheets are employed in the procedures 
outlined here, adding items of behavior to be scored entails relatively little 
work, 

REFERENCES 

{1] Wherry, R. J. and Gaylord, R. H. The concept of test and item reliability in relation 
to factor pattern. Psychometrika, 1943, 8, 247-269. 

[2] Wherry, R. J., Perloff, R., and Campbell, J. T. An empirical verification of the Wherry- 
Gaylord iterative factor analysis procedure. Psychometrika, 1951, 16, 67-74. 

{3] Wherry, R. J. and Winer, B. J. A method for factoring large numbers of items. Psycho- 
metrika, 1951, 18, 161-179. 

Manuscript received 7/25/55 


day 


Revised manuscript received 1/16/56 








